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Studies in Job Evaluation: Il. The Adequacy of Abbreviated 
Point Ratings for Hourly-Paid Jobs in Three 
Industrial Plants 


C. H. Lawshe, Jr. 
Division of Education and Applied Psychology, Purdue University 


The growing importance of job evaluation as a more objective ap- 
proach to the stabilization of industrial wage structures has manifested 
itself in the increasing degree! to which psychologists are concerning 
themselves with the various rating methods now in use. To examine by 
psychological methodology some of these job rating techniques has been 
the intent of the authors of this series of papers, the first? of which 
reported the identification of factors or clusters of items which appear to 
be functioning in one of the most widely used job rating systems.* The 
purpose of the investigation reported in the present paper was to deter- 
mine the extent to which abbreviations of this system yield the same or 
comparable results and to examine any differences in terms of practical 
significance. 


Procedure and Results 


Method. Job rating data were collected from three plants which use 
the NEMA system or a slight modification of it. For each plant, inter- 
correlations between all eleven factors and total points were computed 
and a correlation matrix was prepared.‘ 








1 Herbert Moore. Problems and methods in job evaluation. J. consult. Psychol., 
1944, 7, 90-99. 

2 C. H. Lawshe, Jr., and G. A. Satter. Studies in job evaluation. 1. Factor analyses 
of point ratings for hourly-paid jobs in three industrial plants. J. appl. Psychol., 1944, 
28, 189-198. 

3 Job rating: Definition of the factors used in rating jobs—hourly rated occupations. 
Chicago: Industrial Relations Department, National Electrical Manufacturers Associa- 
tion, 1938. Pp. 22. 

‘ These matrices together with descriptions of the hues plants and other pertinent 
data are presented in the first of this series of papers, C. H. Lawshe, Jr., and G. A. Satter, 
Op. cit. 
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The Wherry-Doolittle shrinkage selection method as reported by 
Stead, Shartle; et al ° was applied and the first three items were identified 
in each plant. The multiple R’s are presented in Table 1. 


Table 1 
Correlation Coefficients between Ratings on Selected Items and Total Point Ratings 











Plant 
Selected Rating Scale Items -A B Cc 
Experience (or Learning time) .96 .93 86 
Experience (or Learning time) plus Hazards 97 — 91 
Experience plus Initiative —_— 95 — 
Experience (or Learning time) plus Hazards 
plus Education .98 —_ _ 
Experience plus Hazards plus Initiative _ — .93 
Experience plus Initiative plus Responsibility 
for the Safety of Others — .96 -— 





Items Identified. As is shown in Table 1, the “experience or learning 
time” item is the single variable which correlates highest with “total 
points” in each of the three plants, the coefficients for plants A, B, and C 
being .96, .93, and .86 respectively. In plant A, when “hazards’’ is added, 
the multiple correlation becomes .97 and when “education” is added it 
becomes .98. In plant B, when “initiative” is added the correlation is 
increased to .95 and when “responsibility for the safety of others’ is 
added, the value is .96. For plant C, the multiple correlations are in- 
creased to .91 and .93 with the subsequent inclusion of “‘hazards’’ and 
‘“nitiative.” 

It should be pointed out that in none of the three plants did the R 
start to shrink when the third variable was added. However, the high 
value of the correlations, plus the fact that the increment resulting from 
the addition of the third variable is so small, makes further application of 
the technique seem unnecessary. The difference between these incre- 
ments added to the R’s by the third selected items as compared to the 
increment that would have been added by other items is so small that 
considerable sampling error could be present. For example, in Plant B, 
when the correlations are carried to a third decimal place, the addition of 
“responsibility for the safety of others’ to “experience” and ‘‘initiative’’ 
increases the R from .953 to .962, an increment of .009. Other items 
would have increased the obtained R by perhaps .007 or .008. It seems, 


5 William H. Stead, Carroll L. Shartle; et al. Occupational counseling techniques, pp. 
245-252, New York: The American Book Company, 1940. 
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then that too much importance should not be attached to the particular 
item that was added last. In spite of this fact, however, Table 1 shows 
a certain consistency from plant to plant in the particular items that were 
selected. In the three variables selected, “experience or learning time’’ 
appears in all plants, always first, while “hazards” and “initiative’’ each 
appear in two of the plants. 

Accuracy of Prediction. Table 2 lists the standard errors of estimate 
for predicting the total point rating in each plant from one, two, and three 


Table 2 


Standard Errors of Estimate for Predicting Total Point 
Ratings from Selected Scale Items 

















Plant A Plant B Plant C 
Selected Items Crest. % Cent. % Coat. % 
Best Single Item 17.4 30 17.6 37 5.4 51 
Best Two Items 13.7 24 14.4 30 4.5 42 
Best Three Items 11.4 19 13.0 27 3.8 36 





items in the rating scale. For example, the standard error of estimating 
the total point rating for a particular job from the best three items is 
11.4 in Plant A, 13.0 in Plant B, and 3.8 in Plant C. In other words, in 
Plant A, the estimates for approximately two-thirds of the jobs are within 
11.4 of the total point rating based on all eleven factors. The percentage 
figures in Table 2 indicate the proportional size of the errors in terms of the 
standard deviations of the several distributions. 


Practical Implications of Grade Placement 


Application of Abbreviated Scale. In the plants studied, how many 
jobs would actually be shifted insofar as rates of pay are concerned if an 
abbreviated scale were used? Is the standard error of estimate of 11.4 in 
Plant A practically significant? This question can best be answered 
through an analysis of the changes that would actually occur if only three 
items were used. 

Prediction Formula. Plant A has been selected as an example. Us- 
ing the data from this plant, the regression equation for predicting total 
points from “experience or learning time,” “hazards,’”’ and “‘education”’ 
was found to be: 


Xrp = 30.4 + 1.4exp. + 5.4e02. + 2.0z<uc. 


Point ratings on “experience or learning time,’ “hazards” and ‘“‘educa- 
tion” were substituted in the formula for each of the 247 jobs in the plant 
to obtain the computed ratings. These computed values are shown 
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plotted against the total point ratings (eleven items) in the scattergram 
(representing the previously mentioned R of .98) in Figure 1. Super- 
imposed over the scattergram are eleven shaded areas, each representing 
a different labor grade. For example, jobs which “rate” from 144 points 
to 165 points are in the second labor grade. Any job which falls inside 
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TOTAL POINT RATING - ELEVEN ITEMS 
Fie. 1. Graph showing ratings computed from three scale items plotted against 


total point ratings (all eleven items) for 247 jobs in Plant A. The eleven shaded areas 
define the labor grades designated by the numbered arrows. 


a shaded area is placed in the same labor grade by both the original scale 
and the abbreviated scale, and any job which falls in an unshaded area 
would be displaced one or more labor grades by the abbreviated scale. 
Labor Grade Displacement. Table 3 shows that of the 247 jobs in this 
particular plant, 153 or 62% would remain in the same labor grade, 92 
or 37.2% would be displaced by one labor grade, while only 2 jobs or 
0.8% would be displaced two labor grades. Table 3 also shows the 
number of jobs deviating by varying numbers of points, classified as 
“same labor grade,” “displaced one labor grade,” and “displaced two 
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Table 3 


Discrepancies between Total Point Ratings (Eleven Items) and 
Ratings Computed from Three Items for Plant A 


No. of Jobs by Labor Grade Displacement 











Points Same Displaced Displaced 
of Labor One Labor Two Labor 
Deviation Grade Grade Grades All Jobs 

0-4 68 9 77 
5-9 48 27 75 
10-14 20 28 48 
15-19 11 21 32 
20-24 6 2 bal 
25-29 4 4 
30-34 1 l 2 
35-39 1 1 


Totals 153 92 2 247 





labor grades.” This table and the data from which it was prepared 
reveal that only seven jobs deviate from their original placement by more 
than 22 points, the range of most of the labor grades. The fact that only 
seven jobs have point differences greater than the difference between the 
highest rated job and the lowest rated job in any given grade, is additional 
evidence of the comparability of the two systems. 


Wage Structure Considerations 


Range of Rates. Examination of the rate schedule (Table 4) in Plant 
A tends to minimize the practical importance of such differences as would 


Table 4 
Rate Schedule For Plant A 











Rates 
Labor Point One Two 
Grade Range Starting Month Months Maximum 
1 Up to 144 65 70 15 81 
2 145 to 166 65 70 75 87 
3 167 to 188 .70 75 80 93 
4 189 to 210 75 80 85 .99 
5 211 to 232 .80 85 .90 1.05 
6 233 to 254 85 90 95 1.11 
7 255 to 276 .90 95 1.00 1.17 
8 277 to 298 .95 100 1.05 1.23 
9 299 to 320 1.00 105 1.10 1.29 
10 321 to 342 1.05 110 1.15 1.35 
11 343 and up 1.10 115 1.20 1.41 
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exist between the application of the original scale and the abbreviated 
scale. The rate of $1.05 per hour, for example, is the maximum rate for 
jobs in labor grade five and is earned by employees on some jobs which 
are evaluated as low as 211 points. On the other hand, $1.05 is the 
starting rate for labor grade ten and is paid to some employees on jobs 
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Fic. 2. Graph showing point range of jobs which at one time or another carry a rate of 
$1.05. Note that only seven jobs fall in one shaded band but not in the other. 


that are evaluated as high as 342 points. This rate, then of $1.05, may 
be paid at one time or another to employees on jobs ranging through six 
different labor grades and evaluated at from 211 to 341 points. To be 
sure, a progression system is used and varying amounts of time are spent 
on jobs in these several labor grades when the $1.05 rate is paid. How- 
ever, the fact that a particular rate does occur in six consecutive labor 
grades tends to minimize the practical significance of a few points of 
difference on the scale. Figure 2 shows that nearly every job that falls 
within the 211 to 342 point range on one scale also falls within the same 
range on the other. 
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The Reliability of Point Ratings 


Practical Difficulties. The reliabilities of the original ratings in the 
three plants investigated are not known just as they are not known in 
any plant. It is impractical if not impossible to obtain re-ratings of jobs 
by a second jury of equal competence or of equal familiarity with the 
jobs. For this reason, it is impossible to determine whether or not the 
correlation between the complete scale and the abbreviated scale is as high 
as the reliability of the scale itself. Needless to say the reliability of 
human judgments rarely attains the magnitude of the correlations pre- 
sented in Table 1, whether they be judgments of personality traits or 
of physical phenomena. 

Table 5 sheds some light on the reliability problem. Ina fourth plant, 
the same jury of supervisors and analysts rated a group of five jobs on 
March 16, rerated them on March 22, and again rated them on April 7. 


Table 5 
Point Ratings and Corresponding Rates for Five Jobs Rated at Three Different Times 

















Maximum 
First Rating Second Rating Third Rating Change 
Job Points Rate Points Rate Points Rate Points Rate 

A 435 .95 395 91 335 85 100 10 
B 405 .92 330 84 295 81 110 ll 
Cc 315 83 260 .78 245 77 70 06 
D 355 87 370 88 380 89 25 02 
E 330 84 380 89 380 89 50 05 

Mean Change 71 05 





The table shows that the fluctuations in point ratings ranged from 25 to 
110 points with an average change of 71 points, while the corresponding 
rates that would be paid for jobs with these point values showed changes 
ranging $.02 to $.10 per hour with an average change for the five jobs 
of $.05 per hour. Perhaps the jobs changed; perhaps the analysts ac- 
cumulated additional information for the subsequent ratings; perhaps 
the analysts improved; or perhaps they got tired. Whatever the reason, 
the point values did fluctuate. The fact of these fluctuations further 
minimizes the practical importance of a few points in this particular job 
rating plan. That there is high agreement between the original system 
as it functions in the three plants and the abbreviated system is a fact. 
Whatever the amount of deviation, there has been no intention to imply 
that either the original scale or its abbreviation is the criterion against 
which to measure the other. Such deviations from perfect reliability as 
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are almost certain to exist force the conclusion that the complete scale 
and an abbreviation of it yield results which in terms of practical operation 
are almost identical. 


Summary and Conclusions 


Job rating data from three different plants were subjected to the 
Wherry-Doolittle selection technique following the intercorrelation of 
points awarded on each item. The three items yielding the highest R 
with the total point rating from eleven items were identified for each 
plant and a more detailed study of Plant A was made. The following 
conclusions are supported: | 


1. In each of the three plants, the “experience or learning time” item 
in the scale correlated highest with the total point rating arrived at from 
eleven items, the r’s being .96, .93, and .86. 

2. The R for the combination of the three optimum scale items was 
found to be .98, .96, and .93 for plants A, B, and C, respectively. 

3. The items, “hazards” and “initiative” each appeared twice in the 
optimum three items in the plants studied. 

4. If the three item abbreviated scale were employed in Plant A, 62% 
of the jobs would remain in the same labor grade, 37.2% would be dis- 
placed one labor grade, and 0.8% would be displaced two labor grades. 

5. Such deviations as exist between point ratings assigned by the 
original scale and by the abbreviated scale seem practically unimportant 
in terms of the magnitude of the range within any given labor grade, the 
flexibility of the plant wage schedule itself, and the probable unreliability 
of the ratings. 

6. A simplified scale consisting of three or four items would probably 
yield results that are practically identical with those obtained by a more 
complex system and would greatly reduce the time consumed by the 
rating activity. 


Received June 30, 1944. 


























The Value of Aptitude Tests for Supervisory Workers in the 
Aircraft Engine and Propeller Industries 


John T. Shuman 
Williamsport Technical Institute, Williamsport, Pennsylvania 


What the future holds for any company depends in a large measure on 
the ability, vision, and leadership of those in supervisory and executive 
positions. Management today is quite generally free from too great 
dependence on rule-of-thumb and the shrewd-guess method of operating. 
Time consuming engineering efforts bring new designs into being, provide 
the tools and other facilities required for manufacture of an article. Yet 
the problems involved in the selection and training of men as supervisors 
have received inadequate attention. Generally industrial managers 
would not countenance such thoughtless procedures in the handling of 
raw materials as are sometimes used in the selection and development of 
supervisors. This part of the investigation, however, is confined to only 
one phase of this problem—that of reporting the results obtained in com- 
paring certain test scores with the job success of the supervisors studied. 

In their present state tests are not wholly adequate for predicting 
supervisory success. However, tests and rating scales do provide good 
and effective bases from which to start, or checks by which to gauge 
decisions. Supervisory and executive ability arise from the interaction 
of many different qualities and abilities, and no single test will measure 
entirely the many qualities and abilities involved. 


Method 


The foremen, assistant foremen, set-up men, and group leaders as 
the case may be, were tested in small groups. These men were already 
working as supervisors when given the tests; hence, it is safe to assume 
that some form of natural selection had already taken place in most 
instances. 

The rating sheet illustrated here was used to secure a fairly objective 
rating on the job success of each supervisor tested. These ratings were 
made by superiors of the supervisors studied. No more objective meas- 
ures of job success were available which would have been even relatively 
free from possible distortion by factors beyond the control of the super- 
visor. It is true, of course, that the study therefore is valid to the extent 
that the ratings represent a true picture of the job success of these indi- 
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viduals. Relatively few ratings of ‘‘poor’” were secured, probably due to 
the reluctance of superiors to so rate the men. For this reason, many of 
the comparisons will be made between the “excellent”? and “average”’ 
ratings. In other words, the percentage of excellent ratings will be used 
to determine the efficacy of the tests. This procedure is justified for 
several reasons: 1. The small proportion of poor ratings. 2. The prob- 
able operation of the so called “halo effect” among the average ratings. 
3. The excellent foreman is the one in whom we are most interested. 

Experience in giving these tests to groups of foremen indicates that 
giving tests to groups of older employes is not too desirable a practice if it 
is at all possible to test all employes at the time they are hired. Since a 
man or woman applying for a position is usually willing to take a test, no 
problem of morale is involved because of the possibility of imminent 
decisions affecting the man’s job in the near future. 





Rating Sheet Used in Rating Supervisors 


Date 








Name 





Position 


Rated by 








1. Production: Consider whether work 
generally moves through this de- 
partment on schedule, scrap and re- 
work. 





. Handling workers: Consider disci- 
pline; extent to which this man has 
difficulty with his men; attitude of 
men toward company policies, etc. 





. Condition and maintenance of depart- 
ment: Good housekeeping, safety, 
condition of machines. 





. General: In general and from all as- 
pects how would you rate. 
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Table 1 presents a summary of recommended minimum critical scores 
and the percentage of improvement effected in the selection of foremen 
at the three manufacturing plants studied. 


Table 1 


Summary of Recommended Minimum Critical Scores and Improvement Effected 
in Selection of Foremen, All Plants 





Per Cent Improvement in 








Minimum Critical Score Excellent Ratings 
Ly- American Spencer Ly- American Spencer 
coming Propeller Heater coming Propeller Heater 
Teast (N=99) (N=89) (N=24) (N=99) (N=89) (N=24) 
Otis Q.S. Test of 
Mental Ability Beta, A 33 30 32 9 5.4 42 
Minnesota Paper 
Form Board, AA 24 34 18 8 5.4 21 
Bennett Test of 
Mechanical 
Comprehension, AA 30 30 25 10 8 12 





The per cent improvement effected in the selection of Foremen was 
greatest at the Spencer Heater Plant and least at the American Propeller 
Plant with the results at the Lycoming Plant between the two. It is 
significant, however, that an improvement in selection would be possible 
at each of the three plants with all of the tests used. 

Table 2 summarizes the results with group leaders and job-setters at 
two of the plants. The third plant, Spencer Division, had no supervisors 
in this category. The percentage of improvement in selection possible at 


both the plants represented in Table 2 is rather high. The results secured 
Table 2 


Summary of Recommended Minimum Critical Scores and the Improvement Effected in 
Selection of Job-Setters and Group Leaders, All Plants 














Minimum Critical Per Cent Improvement 
Score in Excellent Ratings 
American American 
Lycoming Propeller Lycoming Propeller 
Test (N=25) (N=60) (N=25) (N=60) 
Otis Q.S. Test of Mental 
Ability Beta, A 34 34 17 24 
Minnesota Paper 
Form Board, Revised, AA 30 24 30 12 
Bennett Test of Mechanical 


Comprehension, AA 36 27 47 14 
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here with the group leaders in the American Propeller Corporation are 
much higher and more positive then those secured in the same company 
for foremen and reported in Table 1. 

As indicated in Table 3, the use of these three tests would improve 
generally the selection of excellent supervisors by 15 to 20 per cent. 


Table 3 


Average Improvement Effected in Selection of Excellent Supervisors,* 
: All Plants, All Supervisors, N = 297 





Mean Improvement in Mean Improvement 
Selection Excellent in Selection of 
Supv. at Minimum Excellent Supv. 
Test ritical Scores at Qi 





Bennett Test of Mechanical 

Comprehension, AA 18% 20% 
Otis Q.S. Test of Mental 

Ability Beta, A 19% 17% 
Minnesota Paper 

Form Board, Revised, AA 15% 17% 





* Includes all supervisors: foremen, assistant foremen, group leaders, and job setters. 


In Table 4 statistically significant correlations were secured on the Otis 
test in the Lycoming and Spencer Heater plants; on the Bennett test in 
the Lycoming and Spencer plants; and on the Minnesota Paper Form 
Board in only the Lycoming plant. No statistically significant correla- 


Table 4 


Summary of Correlations between Job Ratings and Test Results, 
Foremen and Assistant Forement, All Plants 





lois 
Job Rating with Raw Scores 





American Spencer 
Lycoming Propeller eater 
ivision Corporation Division 
Tests (N =99) (N =89) (N =24) 
Otis Q.S. Test of 
Mental Ability 
Beta Test, A 





Minnesota Paper 
Form Board, 
Revised, AA 


Bennett Test of 
Mechanical Com- 
prehension, AA 
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tions were secured on any one of the three tests at the American Propeller 
Corporation Plant. 

Table 5 summarizes the correlations obtained on the group leaders and 
job setters. The correlations with but one exception are significant. 
This one exception, an r of .37 + .10, on the Bennett Test with the 
American Propeller Corporation group leaders, approximates a significant 
correlation. Again, the results with this group of American Propeller 


Table 5 


Summary of Correlations between Job Ratings and Test Results, 
Group Leaders and Job Setters, All Plants 





This 
Job Rating with Test Scores 








Lycoming American 
ivision Propeller 
Test (N =25) (N =60) 
Otis Q.S. Test of Mental 
Ability Beta A 46 + .14 54 + .08 
Minnesota Paper Form 
Board, Revised, AA 59 + .13 39 + .09 
Bennett Test Mechanical 
Comprehension, AA .73 + .10 37 + .10 





Table 6 
Mean Correlations between Job Ratings and Test Scores, All Supervisors, All Plants 





Mean Biserial r’s 
t Supervisors All Plants 


Mean Biserial r’s Except Amer. Prop. 
Supervisors All Plants oremen * 
Test (N =297) (N =208) 





Bennett Test of 
Mechanical Com- 
prehension, AA 45 + .04 .55 + .04 


Otis Q.S. Test of 
i Mental Ability 


Beta, A 42 + .04 51 + .045 
Minnesota 
Paper Form Board, 

Revised, AA 33 + .04 39 + .05 





* The results secured with the Foremen, American Propeller Corporation, did not 
coincide with those secured for the other four groups of supervisors including the Group 
Leaders from the same Company. Since these results were so much out of line, the 
author is taking the liberty of using a mean in this column which excludes this group. 
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Corporation supervisors are much more positive than the results reported 
for the foremen of the same company and reported in Table 4. 

Averaging the r’s secured with all supervisory groups yields the results 
shown in Table 6. The Bennett Test proved to be the most effective, the 
Otis Test next most effective, and the Minnesota Test the least effective 
of the three. 


Conclusions 


1. Job-success on supervisory work was found to be related positively 
and significantly to the test scores in these three dissimilar industrial 
plants engaged in some phase of metal working. It is significant that 
each of these plants was entirely different from the others in the product 
manufactured and the character of work performed in the plant. It is 
reasonable to conclude that supervisory work in these different plants 
has some factors, skills, or aptitudes in common measured by the tests. 

2. The promotability of minor supervisors such as group leaders and 
job setters was found to be related positively and significantly with the 
test scores. 

3. The means obtained by the foremen and group leaders on the tests 
at Lycoming were below those obtained by workmen in the more skilled 
job categories in the factory. This is probably due to the effect of addi- 
tional experience and service of this group; that is, years of experience 
on the job count heavily in recommendations for promotion. Further- 
more, the tendency of individuals of superior ability to leave routine jobs 
would have a tendency to drain off many capable individuals before they 
had acquired a sufficient background of experience to be given the respon- 
sibility of supervisory work. 

4. It is also logical that tests might well be utilized to survey a plant in 
order to determine whether its supervisory force measures up to recog- 
nized standards. In other words, the results indicate that there are levels 
below which the supervisory force of a plant should not fall. In this way, 
the proper tests would become one objective measure of a supervisory 
force, almost wholly dissociated from experience. 


Received March 15, 1944. 








Relationship between Interests and Abilities: A Study of the 
Strong Vocational Interest Blank and the Zyve 
Scientific Aptitude Test 


Louis Long 
Student Personnel Bureau, College of the City of New York 


If the vocational counselor is to do an adequate job he must constantly 
seek to discover relationships between the various techniques that he 
uses. The present study arose from speculation about the relationship 
between interests and ability as determined by two standardized tech- 
niques: The Strong Vocational Interest Blank (6) and the Zyve Scientific 
Aptitude Test (8). Since most studies have reported a slight positive 
relationship between interests and abilities the question arose as to 
whether or not students who rated high on the scientific scales of the 
interest blank would do better on the Scientific Aptitude Test than stu- 
dents who rated low. Casual inspection of scores as they were discussed 
in interviews suggested that there was a positive relationship and the 
results of this study confirm this impression. 

Such a relationship is certainly to be expected if the two techniques 
measure what they purport to measure. In this connection it should be 
mentioned that there is some question as to what the Scientific Aptitude 
Test measures (2). Only a slight relationship has been found between 
the scores on this test and college grades (1, 4). Nor does there seem to 
be any great amount of communality between the Scientific Aptitude 
Test and tests of general intelligence (1). If this test is measuring scien- 
tific aptitude it is to be expected that students who rate high on the Strong 
scales for engineers or physicists would make, on the average, higher scores 
on the Scientific Aptitude Test than do students who rate low on the scales 
for these occupations. If the Scientific Aptitude Test is measuring scien- 
tific ability it can also be anticipated that there is a greater degree of rela- 
tionship between scores on this test and high ratings on the “scientific 
scales” of the Strong questionnaire than between scores on this test and 
high ratings on the “nonscientific scales’ of the Strong questionnaire. If 
such hypotheses are verified by an analysis of the scores on the two instru- 
ments it can then be inferred that the Scientific Aptitude Test is measuring 
some phase of ability that separates students who have interests similar 
to the scientist from those who do not have interests similar to the scien- 
tist. Consequently a more positive relationship between the Scientific 
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Aptitude Test and the scientific scale of the Strong than between the 
Scientific Aptitude Test and any other scale of the Strong would suggest 
that the two instruments should be used to supplement each other in 
guidance work. 


Scores Used in the Statistical Analysis 


Strong has published norms not only for the individual occupations 
but also for groups of occupations (6). At the Student Personnel Bureau 
of the College of the City of New York it has become standard practice 
first to score the Strong blank against group occupational keys. If a 
break down of any group is desirable this can be done later. This pro- 
cedure greatly reduces the scoring time and on the basis of the reported 
correlations between the group occupational keys and the individual occu- 
pational keys this short cut seems justifiable (5, 6). The occupational 
groups and the occupations included in each are listed in Table 1. Since 


Table 1 


List of Occupations Included in each of the Six Occupational Groups 
of the Strong Vocational Interest Blank 


Group 1. Technical Non-mathematics Group 4. Business Detail 
Artist Accountant 
Architect Office Worker 
Psychologist Purchasing Agent 
Physician Banker 
Dentist 


Group 2. Technical Mathematics Group 5. Business Contact 
Mathematician Sales Manager 
Engineer Real Estate Salesman 
Chemist Life Insurance Salesman 
Physicist ~ 


Group 3. Welfare Group 6. Verbal 

Y. M. C. A. Physical Director Advertising Man 

Y. M. C. A. General Secretary Lawyer 

Personnel Worker Author 

Social Science Teacher Journalist 

City School Superintendent 

Minister 
this study is primarily concerned with the Strong questionnaire as a tool 
for use in guidance work, the statistical treatment will be oriented around 
the letter ratings that Strong uses to indicate the degree of common inter- 
est between the individual and a particular occupational group. The raw 
scores could have been used, but it was thought that the results would be 
more typical if the letter ratings were employed. In several places 
throughout the report a division has been made between students obtain- 
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ing an A or B+ rating and those obtaining a B, B—, C+, or C rating. 
This dichotomy is arbitrary, but such a division is often made in counsel- 
ing on the basis of Strong’s suggestion that “‘a person should consider 
seriously those occupations in which he receives A or B+ ratings before 
entering some other occupation” (6). 

The total score of the Scientific Aptitude Test has been used through- 
out this report. The scoring procedure recommended by Zyve (8) was 
followed. 


Subjects and Test Scores 


Scores on both the Strong Vocational Interest Blank and the Scientific 
Aptitude Test were available for 200 students of the College of the City 
of New York. Each student had sought advice from some staff member 
of the College’s Student Personnel Bureau and at the suggestion of the 
latter had taken both the Strong and the Zyve. Scores on the Thurstone 
A. C. E. Psychological Examination were also available for these students. 


Results 


The average score on the Scientific Aptitude Test for each letter 
rating of the six occupational groups will be found in Table 2. For 
example, the average score on the Scientific Aptitude Test for students 
obtaining an A rating on the Strong Technical Non-mathematics group 


Table 2 


Relationship between Scores on Zyve Scientific Aptitude Test and Ratings 
on Strong Vocational Interest Blank 











Occupational Average Score on Zyve Test According to Categories of Strong 
Groups of 
Strong A B+ B B- C+ C 

Tech. Non-Math. 107.4 96.5 93.5 93.0 100.0 85.0* 
Tech. Math. 111.4 106.3 88.2 90.8 91.9 87.4 
Welfare 96.7 101.9 98.9 105.2 92.0 119.8* 
Business Detail 84.1 103.4 93.9 97.2 97.0 106.6 
Business Contact 88.8 82.4 94.8 99.7 105.2 107.6 
Verbal 94.5 97.6 100.0 101.9 104.9 115.4* 





* Average based on less than 10 cases. 


was 107.4. A positive trend will be noted in the case of the Technical 
Non-mathematics and Technical Mathematics groups; a negative trend 
is found in the case of the Business Contact and Verbal groups; but no 
definite trend is apparent in the two remaining groups: Welfare and Busi- 
ness Detail. To form some idea of the significance of these trends the six 
steps on the Strong scale were consolidated into two steps: ratings of A and 
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Table 3 


Reliability of the Difference between Average Scores on Zyve Scientific Aptitude Test for 
High and Low Ratings on Strong Vocational Interest Blank 





Subjects with Ratings of Subjects with Ratings of 
B+ or more on Strong B or less on Strong P-value 


of the 
Occupational Difference 


Groups of between 
Strong Averages 


Tech. Non-math. 110 . <.01 
Tech. Math. 97 * 103 <.001 
Welfare 108 f 92 

Business Detail 24 ' 176 

Business Contact 34 . 166 

Verbal 84 i 116 











B+ and ratings of B orless. The average score on the Scientific Aptitude 
Test for students falling into these two categories will be found in Table 3. 
The reliability of the difference between the averages was determined by 
using Student’s t-Test (3). The P-values presented in Table 3 indicate 
that in three of the six cases such a difference between the averages would 
be expected to occur by chance less than five times in 100. Using this 
as a criterion of a significant difference we can say that the following trend 
is significant: students scoring high on the Scientific Aptitude Test rate 
higher on the Strong Technical Non-mathematics and Technical Mathe- 
matics groups, but lower on the Business Contact group than do those 
scoring low on the Scientific Aptitude Test. 


Table 4 


Bi-serial Correlations between Scores on the Scientific Aptitude Test and Ratings on 
Strong Vocational Interest Blank (B+ or more versus B or less) 


Occupational Groups Bi-serial 
of Strong Correlation 
Tech. Non-math. 0.26 
Tech. Math. 0.50 


Welfare —0.04 
Business Detail —0.12 


Business Contact —0.37 
Verbal —0.14 


If the Scientific Aptitude Test is positively related to any occupational 
group of the Strong scale it would be expected that the Technical Mathe- 
matics group would be the most likely one to show this relationship. The 
second most likely group would be the Technical Non-mathematics. Pos- 
itive relationships were found in both instances. The extent of the 
relationship is, however, greater for the Technical Mathematics group 
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since the P-value in one case is 0.001 and only 0.01 in the other. This 
difference can also be brought out by calculating a bi-serial correlation 
between scores on the Scientific Aptitude Test and the categories of B+ 
or more versus B or less on the Strong scale. When this is done a bi-serial 
r of 0.50 is obtained for the Technical Mathematics group and one of 0.26 
for the Technical Non-mathematics. (The correlations for the other 
occupational groups will be found in Table 4.) 

The negative or chance relationship between the ratings on the four 
other occupational groups and scores on the Scientific Aptitude Test 
strengthens the above finding since there is no reason to expect a direct 
correspondence between the two techniques in these cases. 

In an effort to determine whether the students with high scores on the 
Technical Mathematics group of the Strong scale were just superior 
students or “high scorers” the same analysis was applied to the total score 
made on the Thurstone A. C. E. Psychological Examination. Of the 
200 students in the preceding comparisons, 95 took the 1938, 1939, or 1940 
edition of the Thurstone test. The scores on the 1938 and 1940 editions 
were converted into 1939 equivalent scores by using the table supplied by 
Thurstone (7). The average scores on the Thurstone test for students 
obtaining an A, B+, B, B—, C+, or C rating for the six occupational 
groups of the Strong scale will be found in Table 5. The absence of 


Table 5 


Relationship between Scores on Thurstone A. C. E. Psychological Examination (1939) 
and Ratings on Strong Vocational Interest Blank 





Occupational Average Score on Thurstone According to Categories of Strong 
Groups of 
Strong A B+ B B- C+ Cc 


Tech. Non-math. 124.0 111.4* 124.7 114.3 135.3* 132.0* 
Tech. Math. 121.3 129.8 120.0 121.8 132.9* 120.6* 
Welfare 125.0 127.4 118.9 127.5* 122.0* 116.6* 
Business Detail 116.1* 110.7* 126.1 118.7 128.1 124.4 
Business Contact 145.0* 109.8* 122.0 127.1 121.1 119.3 
Verbal 129.8 121.7 120.9 120.5 111.7* 116.0* 











* Average based on less than 10 cases. 


definite relationships between scores on the Strong scale and scores on the 
Thurstone test is clearly evident, except in the case of the Verbal group. 
When the average Thurstone score for students rating high on the Strong 
scales was compared with the average Thurstone score for students rating 
low on the Strong scales a significant difference was found only in the case 
of the Verbal group (Table 6). It seems logical to expect a relationship 
between the ratings on the Verbal group of the Strong scale and the scores 
on the Thurstone test, since the latter is so highly verbal. 
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Table 6 


Reliability of the Difference between Average Scores on Thurstone A. C. E. Psychological 
Examination (1939) for High and Low Ratings on Strong Vocational Interest Blank 





Subjects with Ratings of Subjects with Ratings of 











B+ or more on Strong B or less on Strong fr 
of the 
Occupational Average Average Difference 
Groups of Score on Score on between 
Strong N Thurstone N Thurstone Averages 
Tech. Non-math. 49 121.7 46 124.8 >.40 
Tech. Math. 45 123.6 50 122.8 >.80 
Welfare 49 125.8 46 120.4 >.10 
Business Detail 10 114.5 85 124.2 >.10 
Business Contact 15 126.2 80 122.6 >.50 
Verbal 40 128.5 55 119.3 <.05 
Summary 


The relationship between interests and scientific ability as measured 
by the Strong Vocational Blank and the Zyve Scientific Aptitude Test 
was investigated. Use of the occupational group scales on the Strong 
disclosed that in three of six groups there was a reliable difference between 
the average score on the Scientific Aptitude Test for students rating high 
on the interest questionnaire and for those rating low. The findings indi- 
cate that students scoring high on the Zyve Test rate higher on the Strong 
Technical Non-mathematics and Technical Mathematics groups, but 
lower on the Business Contact group than do those scoring low on the 
Zyve Test. In order to rule out the possibility that these results were due 
to the selection of superior students the same analysis was applied when 
scores on the Thurstone A. C. E. Psychological Examination (1939) were 
substituted for scores on the Zyve Scientific Aptitude Test. No definite 
relationship between scores on the Thurstone test and ratings on the 
Strong scales was found except in the case of the Verbal group. 

The results indicate that the Zyve Scientific Aptitude Test is measur- 
ing some phase of ability that separates students having interests similar 
to those found among the occupational groups included in the Technical 
Mathematics and the Technical Non-mathematics groups of the Strong 
from those students who do not have interests similar to those found in the 
above occupational groups. From the common sense point of view the 
agreement between the two measurements is to be expected. The Zyve 
Test deals largely with problems involving mathematics and principles of 
physics. Even when the items are intended to measure general abilities 
(such as reasoning, generalizing, and suspending judgment) the content of 
the item is usually drawn from the physical science field. Similarly, in- 








Relationship between Interests and Abilities 197 


spection of the keys of the Strong Technical Mathematics and Technical 
Non-mathematics scales reveals that the items dealing with scientific and 
technical subjects or activities are heavily weighted. Consequently some 
agreement between the two instruments would be expected. The extent 
of this agreement, however, is far from perfect. For example, there is not 
enough agreement to permit the prediction of a score on the Zyve Scien- 
tific Aptitude Test from a rating on the Technical Mathematics scale of 
the Strong. It is, therefore, concluded that the use of both of these 
instruments in counseling is better procedure than the use of either one or 
the other if the capacity of a student to do work in engineering or science 
is under consideration. 


Received May 15, 1944. 
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The Measured Interests of Marine Corps Women Reservists * 


Milton E. Hahn, Captain, USMCR, ** and Cornelia T. Williams, 
Major, USMCWR 


Headquarters, U. S. Marine Corps, Washington, D. C. 


A fundamental purpose of the whole Marine Corps’ system of person- 
nel classification is the assignment of each individual to the type of mili- 
tary duty where he or she can soonest and most efficiently serve the 
Marine Corps. In the accomplishment of this purpose, and as a pre- 
liminary to military assignment, each recruit is tested, interviewed, and 
given an opportunity to express his “‘choice of duty’’; and all of the in- 
formation thus obtained is recorded on his Qualification Card, Form 940. 
In this way, his basic aptitudes, his education and work experience, his 
special skills, and at least a crude indication of his interests become avail- 
able as a basis for determining the most appropriate military assignment 
for each Marine. 

Civilian occupational backgrounds of women enlisting in the Marine 
Corps supplied extremely useful data for proper assignment to military 
duty, but were at times misleading, or inadequate. For example, many 
school teachers and clerical workers joined the Marine Corps to escape 
from their civilian jobs. Many of the younger women joining the Corps 
had little, if any, previous work experience, and therefore, possessed no 
specific occupational skills. Others had considerable experience and had 
even developed a high degree of skill, but in a specialty so unrelated to 
any of the jobs open to women in the Marine Corps, that this experience 
was of little help in determining appropriate military assignments. Fi- 
nally, many of the occupational specialties performed by women Re- 
servists in the Marine Corps could not be significantly differentiated by 
available estimates or measures of basic aptitudes. 

These facts made it obvious that in many instances a crucial factor in 
determining a military assignment would have to be a woman’s interest in 
doing a particular type of work. This was especially true for women who 

* This report is part of a study made by the authors for the Classification Division, 
Detail Branch, Personnel Department, USMC, Headquarters, Washington, D.C. The 
opinions or assertions contained in this article are the private ones of the authors and 


are not to be construed as official or reflecting the views of the Navy Department or the 
naval service at large. 


** Dr. M. E. Hahn is now Director of the Psychological Services Center at Syracuse 
University, Syracuse, N. Y. 
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had no special occupational skills, and for women who possessed occupa- 
tional skills which could not be used directly in military duties. The need 
for a valid estimate of interests was equally great in selecting women for 
assignment to certain military jobs in which large numbers of women were 
needed but for which special training had to be arranged because so few 
women entering the service had had relevant previous experience. 

A search was necessary, therefore, for tools and techniques which could 
be utilized to bring about the assignment of women in the Marine Corps 
to duties for which they would not only possess the necessary minimum 
aptitudes and skills but in which they would also be interested and satis- 
fied, and therefore, able to perform with greater efficiency. 

In the early stages of the Marine Corps Women’s Reserve program the 
only technique used for determining the interests of individuals was the 
interview. For most of the enlisted women this amounted to little more 
than one or two direct questions about “choice of duty” inserted at the 
end of the classification interview. A minority of each recruit class (those 
being considered for a few special types of military assignment) were 
reinterviewed to obtain a more precise description of their previous ex- 
perience or to permit a clearer expression of their interests. Although the 
information relative to interests derived from interviews was helpful in 
making assignments to duty in the Marine Corps, a year of experience 
demonstrated the need for supplementary screening devices. Interviews, 
if they were to yield significant information on interests, were time- 
consuming; interviewers were variable in the reliability and validity of 
their judgments; and the claimed interests of the women, expressed 
merely as a preference for a specific military assignment, often had little 
or no validity as an indication of real interest in actual work activities. 

A study of the situation was authorized by the Director of Personnel, 
Headquarters, United States Marine Corps, in March, 1944. The study 
included three major aspects: (1) the job satisfaction of female personnel 
in selected military occupations; (2) the measured interests of women in 
military occupations; and (3) a comparison of claimed and measured 
interests of women performing military duties in the Marine Corps. This 
report is concerned with the second aspect of the study, the measured 
interests of women now performing certain military duties. 

It was decided to select for study a sample of women Reservists cur- 
rently assigned to several widely different types of military duty. Mat- 
ters of expediency limited the selection to those most readily available 
for testing. All of the women included in the study here reported were on 
duty either at Marine Corps Headquarters, Washington, D. C., or at the 
Marine Corps Air Station, Cherry Point, North Carolina. 

Descriptive data on 667 enlisted women Reservists in the study are 
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presented in Table 1.1. (Three groups of women officers were also in- 
cluded in the study of measured interests but are not included in this 
table.) No marked differences are noted among groups in age or years of 
education. The mean age of motor vehicle operators is somewhat higher 
than the average age for the total group—27.4 vs. 24.6 years. The mean 
age for aviation machinist mates is somewhat lower than the mean age 
for the total enlisted group—22.6 vs. 24.6 years. Otherwise, the groups 
are relatively homogeneous in so far as age and years of education are 
concerned. 

The Army General Classification Test, Forms ¢ or d, and the Army 
Mechanical Aptitude Test, Form 3, are administered to all women Re- 
servists. Norms for Army men are used. The minimum education 
requirement for enlistment in the Marine Corps Women’s Reserve, com- 
pletion of the twelfth grade or the equivalent, has resulted in a mean 
standard scale score of approximately 111 for enlisted women in the 
GCT. The mean GCT score for the enlisted sample discussed here is 
114.0, somewhat above the mean of 102 obtained by enlisted men. 

On the GCT, synthetic training devices instructors, stenographers, 
and the composite group of miscellaneous occupations labelled “Other” 
were above the group mean with respective means of 121.0, 117.8, and 
119.2; standard deviations for these groups were similar to the one for 
the total group. 

The only sub-group with a mean appreciably lower than that for the 
total group was Cooks and Bakers (mean 102.5). This group was also 
the most variable in the GCT scores. 

Scores on the Army Mechanical Aptitude Test Form 3 showed no 
significant differences between means for the military occupational sub- 
groups and the mean for the total sample except for aviation machinist 


mates. The D for the total mean and that for the aviation machinist 
oD 


mate sub-group was 3.93. 

Civilian occupational backgrounds of these enlisted women were het- 
erogeneous. The 161 women Reservists assigned to duty as clerk-typists, 
for example, came from 25 different civilian occupations. 

Two interest tests or inventories were considered. The Strong Voca- 
tional Interest Blank, although it is the best validated and most widely 
used interest inventory, was impracticable from the standpoints of scoring 
costs and complexity of interpretation. The Kuder Preference Record,’ 

1 Discrepancies in the number of cases reported between Table 1 and Table 2 are 
caused by certain records being unavailable for various reasons. 


2The Kuder Preference Record Form BB, Science Research Associates, Chicago, 
Illinois. 
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although it is a comparatively new instrument, offered advantages in the 
simplicity of its administration, scoring, and interpretation. The Kuder 
Preference Record, therefore, was used to measure interests. 

The Kuder Preference Record contains nine scales which are purported 
to measure intensity of interest in nine broad areas of occupational ac- 
tivity. These nine areas are: (1) mechanical, (2) computational, (3) 
scientific, (4) persuasive, (5) artistic, (6) literary, (7) musical, (8) social 
service, and (9) clerical.’ 

This test is taken by punching with a stylus two of six possible choices 
foreach item. Scoring is extremely simple. Single unit weights are used 
for scoring the items. The nine scores on the Preference Record yield an 
interest pattern or profile for the respondent. Adequate interest profile 
norms are not available for civilian occupational groups. 

The test was standardized with the assumption that significant sex 
differences did not exist. The validity of this assumption can be tested 
by a comparison between the original norms, based upon a mixed group 


Table 2 


Means and Standard Deviations for 791 USMCWR on the Nine Scales 
of the Kuder Preference Record, Form BB (N =515) 











Means Standard Deviations 

Kuder Kuder = D, 
Kuder Scales USMCWR Norms* USMCWR Norms * 7. os 
Mechanical 59.1 53. 20.18 18. 1.070 
Computationa! 29.9 30. 12.62 16.5 854 
Scientific 57.6 51. 15.76 13.8 .828 
Persuasive 61. 71.5 16.08 13.8 835 
Artistic 55.2 49. 15.74 16.3 .909 
Literary 54.1 57. 15.39 15.5 875 
Musical 23.2 25.5 9.69 10. .559 
Social Service 78.1 72. 18.90 17. 1.006 
Clerical 51.6 60. 16.48 15.5 901 





* Means and standard deviations for the Kuder Norm group were estimated from 
percentile norms on the Profile sheet based upon the assumption that distributions were 
normal. 


of college men and women, and the average scores of the Marine Corps 
Women Reservists. These comparisons are presented in Table 2. Al- 
though data on Kuder’s original standardization group were not available 
for proper tests of the significance of differences between means, tentative 


* Manual for the Kuder Preference Record, Science Research Associates, Chicago, 
Illinois. Unfortunately, the present manual (June 1944) for the Preference Record is 
inaccurate and misleading. Those interested in the inventory are referred to the bib- 
liography at the end of this report. 
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tests were made based on the assumption that distributions for Kuder’s 
norm group were normal. These data are presented in Table 3. The 
null hypothesis was not refuted. 


Table 3 


Decile Norms for 791 Adult Women, USMCWR, on the Nine Scales 
of the Kuder Preference Record 





Raw Score 





Deciles 


Kuder Scales 10 20 30 40 50 60 70 80 9 100 Range 








Mechanical 3 40 4 &2 S&7 6 71 7 86 100 21-115 
Computational a BB: at Fr. Ss: Be 65 5- 69 
Scientific Ss &@ 2 8t&t TF Bai & HR PB 96 26-102 
Persuasive 41 47 51 5 59 64 69 74 81 «108 29-112 
Artistic 4 41 46 83 5&4 58 6 69 76 95 21- 98 
Literary 34 40 4 49 «5&3 57 =%62 G67 75 93 23- 96 
Musical o> ,. M.-F: Fe Rm aA # B-.B 46 3- 71 
Social Service 53 6 67 73 78 8 80 9 102 120 30-122 
Clerical a .8 @2 @B H&S OS @B 97 17-100 





Table 3 contains decile raw score norms for the 791 women who com- 
pleted the Preference Record. Intercorrelations were computed for the 
nine scales included in the inventory. These data are presented in Table 
4. With five exceptions—mechanical vs. scientific, mechanical vs. lite- 
rary, mechanical vs. clerical, computational vs. clerical, and artistic vs. 
social service—there is little evidence that correlations among the scales 
depart markedly from zero in so far as this sample is concerned. 

Military occupational sub-samples of women Reservists in the Marine 
Corps were compared statistically. These sub-samples were: Officer, 
line; Officer, staff; Officer, technical specialty; Aviation Machinist Mate; 
Assembly and Repair (Aviation); Motor Vehicle Operator; Cooks and 
Bakers; Synthetic Training Device Instructors; Stenographer; Clerk- 
typist; Clerk, general; Duty Non-Commissioned Officer; and Other.‘ 

Means and standard deviations for these military occupational groups 
are presented in Tables 5 and 6 respectively. 

Time did not permit a comparison of the mean of each occupational 
group with the mean for every other occupational group. The mean for 
each occupational group was, however, compared with the mean for the 
total sample. The Preference Record did differentiate most of the occu- 

‘The “Other” group consisted of 122 cases in military occupations the samples for 
which were too small or the occupation of too little importance to warrant separate 


treatment. Distributions for this occupational composite closely approximated those 
for the total sample. 
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pational groups from generality on one or more scales despite the fact that 
data for the sub-group were not removed from the generality for the 
comparisons. Occupational groups not differentiated significantly from 
generality by at least one of the nine scales were officer—staff, cooks and 
bakers, and synthetic training devices instructors. Table 7 presents ‘‘t’s’’ 
for differences between means for each occupational sub-group and the 
total sample for each of the nine interest areas. 

Table 7 is of particular interest because it illustrates a phenomenon 
which has not received much attention in the literature concerned with 
the measurement of interest—‘“rejection” or “aversion” scores. The 
topic is not particularly germane to this report, but attention is called 
to the fact that occupational groups are as clearly differentiated on interest 
scales by their “rejection” scores as they are by their “acceptance” or 
“interest in’ scores. Bingham’s statement that a C grade obtained on an 
occupational key of the Strong Vocational Interest Blank means ‘‘No”’ 
may need change to “No, for that occupation; yes, for some others.” 5 

The rejection of items positively weighted for the clerical scale may be 
as important to the measurement of an interest in the duties of an Aviation 
assembly and repair worker as the acceptance of items positively weighted 
for the mechanical scale. 

The preliminary aspect of this investigation was a study of job satis- 
faction which indicated satisfied and dissatisfied groups of workers. 
Scores on the Preference Record scales were separated for these satisfied 
and dissatisfied groups. Scores on the inventory for these two groups 
showed marked differences between them on one or more interest scales. 
Tables 8 and 9 present these data for three groups of clerical workers. 
These tables make it obvious that occupational norm groups for a test of 
interests should be composed only of those who are satisfied with their 
jobs. Groups so constituted in this study have interest profiles much 
more clearly differentiated from the total standardization groups than 
sub-samples containing a large proportion of individuals disinterested in 
or dissatisfied with the type of work they are doing. 


Conclusions 


Although the major purpose of this report is to present norms, and 
statistics related to these norms, certain conclusions of general interest 
are presented here. 


5 Bingham, W. V., Aptitudes and aptitude testing, Harpers, 1937. Appendix, Section 
IX, p. 356. 

* None of the other military occupational sub-samples contained enough dissatisfied 
workers to make an analysis of this sort possible or necessary. 
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1. Certain occupational sub-samples of women Reservists in the 
United States Marine Corps can be differentiated by scores on various 
scales of the Kuder Preference Record. 

2. In the case of three sub-groups of clerical workers—stenographers, 
clerk-typists, and clerks, general—certain scales on the Preference Record 
differentiate satisfied from dissatisfied workers. In each of these three 
sub-groups, the satisfied workers are more clearly differentiated from the 
total group of 791 than are sub-groups containing both satisfied and 
dissatisfied clerical workers. 

3. “Rejection” (low or negative) scores on certain scales differentiate 
occupational groups from the generality as markedly as do the “accept- 
ance” (high or positive) scores. 

4. Group differentiation is a matter of patterns or profiles which are 
characterized by both “‘acceptance” and “‘rejection”’ scores. 

5. Comparison of individual profiles with occupational sub-sample 
profiles permits a surprisingly good interest screen for use in the assign- 
ment of women Reservists to military duty. 


Received July 5, 1944. 
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Personality Patterns of Adolescent Girls: I. Girls Who Show 
Improvement in IQ * 


Dora F. Capwell 
Trainee Acceptance Center, Public Schools, Pittsburgh, Pa. 


The research described in this report was designed to discover what 
differences there are between those adolescent girls whose IQ’s change 
significantly upon retest and those whose IQ’s show only slight changes. 
Its purpose, therefore, is to throw additional light upon the problems of 
individual, clinical diagnosis, which inevitably involves or at least implies 
prediction of a future level of performance. The determination of what 
factors within the individual affect test scores is a more narrowly defined 
problem than the one of mere constancy of IQ, measured in terms of group 
data, and is, of course, significantly related to accuracy of diagnosis, which 
is preliminary to the planning of effective educational and social treatment. 

Literally hundreds of studies have been reported which contain retest 
data from intelligence tests. They have been reviewed and summarized 
by Baldwin (2, 3), Burks (6), Foran (9, 10), Nemzek (24), and Thorndike 
(31). Although the studies have dealt with all age levels, various time 
intervals, several different tests, and a wide variety of testing conditions, 
the coefficients of correlation between test and retest have ranged from 
.63 to .95, most groups showing correlations of .84 or better. Treated by 
the correlation technique, the data consistently showed high positive cor- 
relation between test and retest, and hence the authors assumed that they 
demonstrate the relative constancy of the IQ. 

The studies which are of special interest here are the ones which at- 
tempt to determine the causes of change in level of test performance in 
those cases which showed variability of IQ. These may be divided into 


* The writer is indebted to the Department of Psychology, University of Pennsyl- 
vania, and particularly Drs. Malcolm G. Preston, Francis W. Irwin, and Miles Murphy, 
who served as research advisers in the final stages of the work, for advice and constructive 
criticism. 

Gratitude is due to the Bureau of Psychological Services, State of Minnesota, which 
sponsored the study. Valuable assistance was given by the late Dr. Fred Kuhlmann and 
his successor, Dr. Stuart Cook. The staff of the State School for Girls at Sauk Centre, 
Minnesota, and the Sauk Centre Public Schools gave their wholehearted cooperation. 

Dr. Starke R. Hathaway, University of Minnesota, made the Multiphasic Personality 
Inventory available to the writer before its publication and was helpful throughout the 
period of collecting data. 
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three groups: studies which include special analysis of the extreme cases in 
the distribution of IQ changes (7, 13, 20, 22, 25, 32), studies made on 
selected groups of problem children (5, 12), and studies of the effect of 
specific factors, such as attitude, emotion, and psychopathic personality 
(5, 11, 16, 19, 23, 27, 28). All of these studies have been characterized 
by a lack of objective data aside from the intelligence test scores. Most 
frequently causes of IQ variability have been assigned by surveying the 
history, considering the events which have taken place between tests, and 
making a highly subjective judgment. No study reported convincing 
evidence that personality factors are functionally associated with changes 
in IQ on retest, although many authors drew that conclusion from what is 
here considered insufficient evidence. 

The present study attempts to present a more objective description of 
the individuals studied by utilizing scores on a variety of tests, analyzing 
items of possible significance in the history, and also analyzing the record 
of personal-social events which occurred between examinations. The use 
of both a normal group and a definitely maladjusted group permits a more 
thorough exploration of the significance of a wide variety of concomitant 
factors. Although the study was planned to investigate factors related to 
IQ variability, the results led to a study of those whose IQ improved as 
compared with those whose IQ remained relatively constant. Three 
major questions will be asked of the data. 

(1) Are there significant differences of personality and experiential 
background between those adolescent girls whose IQ shows significant 
improvement and those whose IQ changes only slightly? 

(2) Is significant improvement in the IQ of adolescent girls related 
to improvement in the adjustment pattern of the total personality? 

(3) Are the relationships which are investigated first in a delinquent, 
institutionalized, adolescent, female group demonstrable in a non-delin- 
quent, adolescent, female group in the community? 


Procedure 


Subjects. The subjects were 101 delinquent girls and 85 non-delin- 
quent girls who were between ages 12 and 18 and whose IQ on the first test 
was not less than 60. The delinquents were consecutive admissions to 
the Minnesota State School for Girls, beginning in September, 1941. 
The non-delinquents were in the consolidated public school at Sauk Cen- 
tre, Minnesota, and were chosen from grades which would match the 
usual grade distribution of girls entering the State School. The principal 
selected every other girl from the grade lists until the necessary number 
was obtained. The groups were roughly equated for urban-rural back- 
grounds. Both schools have a population which is about two-thirds 
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urban and one-third rural, when their places of residence are classified 
according to the U. 8. Census criterion of urban and rural. 

The method of selection resulted in two groups of the following descrip- 
tion. The delinquents ranged in age from 13 to 19 with a median age of 
16. The non-delinquents ranged from 12 to 18 with a median age of 15. 
The slight difference in age range is a reflection of the fact that the delin- 
quents show more school retardation, so that their average age for each 
grade is a little older. The median grade placement of the delinquents 
was 9th grade; 86% were in grades 7 to 10, inclusive, 12% were above 10th 
grade, and 2% were below 7th grade. The median grade placement of 
the non-delinquents was also 9th grade, with 94% in grades 7 to 10, 
inclusive, and 6% above 10th grade. 

Collection of Data. The data consist of test scores obtained on two 
psychological examinations, items of classification from the personal- 
social history, and items of classification related to progress between 
examinations. The delinquents were examined the first time within their 
initial two weeks at the School, and the second examination was given 
from 6 to 15 months later. The non-delinquents were examined the first 
time in the fall of 1941 and were re-examined from 4 to 13 months later. 
The tests given at both examinations were the Kuhlmann Tests of Mental 
Development (18), the Minnesota Multiphasic Personality Inventory 
(14), the Washburne Social Adjustment Inventory (33), and the Pressey 
Interest-Attitude Test (26). Three tests which each subject took only 
once were the Terman-Miles Test of Masculinity-Femininity (30), the 
Vineland Social Maturity Scale (8), and the Stanford Achievement Test 
(17). Three testing sessions were used to complete the entire battery for 
each examination. All of the tests were administered by the writer with 
the exception of the delinquents’ Stanford-Achievement Tests, which are 
given routinely by the school principal at the State School. 

From the personal-social history of the delinquents tabulations were 
made of the occupation and education of parents, national and racial 
background, language spoken in home, work experience, length of time 
out of school, type of delinquency, and other social problems within the 
family. Information on the non-delinquents was obtained by personal 
interviews and from the school records regarding occupation and educa- 
tion of parents, language spoken in the home, work experience, and any 
social problems in the family. For the delinquents there were also Home 
Ratings made by the State School field workers on a large proportion of 
the cases. 

For the interval between tests the record of the delinquent cases 
included health status, school grades, discipline reports, and ratings of 
work habits and general conduct made by the housemother, work super- 
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visor, and supervisor of home life. For the non-delinquents there was a 
record of health history and school grades. The latter were recorded 
from: the principal’s records, and the student’s report on health was 
checked by records in the office of the school nurse. 


Results 


Intelligence Test Results. The Kuhlmann Tests of Mental Develop- 
ment were used as the measure of intelligence, and, hence, the main 
characteristics of these tests should be kept in mind when examining the 
results. When applied at the age levels used in this study, the scale 
consists of sixteen tests, all of which are timed, and all of which are scored 
at successively higher levels, depending upon the amount accomplished 
in a given time interval, usually two minutes. Five of the tests demand 
the use of spatial relations and of spatial imagination. The others are 
more strictly verbal. Each test is given a time score and an accuracy 
score as well as a mental age score. Sixteen is the highest chronological 
age denominator used in computing IQ’s. The results of these groups 
are summarized in Table 1. 














Table 1 
Kuhlmann Tests of Mental Development 
Delinquents Non-Delinquents 
Mean IQ 8.D. om Mean IQ 8.D. om 
ist Test 87.40 17.10 1.70 101.88 17.59 1.90 
2nd Test 95.65 19.62 1.95 111.76 20.50 2.22 





The non-delinquents are significantly brighter than the delinquents on 
both tests. D/cD on the first test is 5.70, and on the second one it is 5.46. 
Despite the difference in mental level, each group showed about the same 
amount of shift in IQ on the second test. The average amount of change 
was an increase of 8 points for the delinquent group and 10 points for the 
non-delinquent group. 

The coefficient of reliability was computed with the test-retest scores 
for each group. Product moment correlations also were computed to 
determine the relationship of IQ changes to time interval, change in speed 
score, change in accuracy score, and level of first IQ (Table 2). The 
time interval between tests and the change in accuracy score had a negli- 
gible relationship to the changes in IQ, but there was a marked relation- 
ship between speed score and changes inIQ. The reliability coefficient is 
high enough to be considered satisfactory for a test used for individual 
diagnosis. It is of interest in passing that these are the first reliability 
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coefficients reported on the Kuhlmann Tests of Mental Development.’ 
The correlation between IQ change and level of first IQ is so low that no 
relationship between them is indicated, implying that the distribution of 
changes at various IQ levels is not affected by the IQ level itself. 


Table 2 


Product-Moment Correlations Based on the Results from the Kuhlmann 
Tests of Mental Development 





Delinquents Non-Delinquents 





Test-retest reliability .90 + .01 88 + .01 
IQ change and time interval — .02 + .06 06 + .07 
1Q change and change in accuracy score 18 + .06 15 + .07 
IQ change and change in speed score 44 + .05 41 + .06 
IQ change and level of first IQ 08 + .06 15 + .07 





We find that the IQ’s of approximately 36% or 36 of the 101 delin- 
quents changed more than 10 points, and 48% or 41 of the 85 non- 
delinquents changed more than 10 points.? Inasmuch as the standard 
error of a score is 5.4 for the delinquents and 6.7 for the non-delinquents, a 
change of more than 10 points would occur by chance only 5 times in 100 
for the delinquents and 10 times in 100 for the non-delinquents. A 
change gf more than 10 points, then, with this group may be considered a 
significant change. In presenting the results of the other tests, we shall 
divide each major group into two subgroups—those whose 1Q changed 
10 points or less, called the constant group, and those whose IQ changed 
more than 10 points. These latter groups should be thought of as vari- 
able groups in contrast to the constant ones, but since only one delinquent 
and one non-delinquent had IQ’s which regressed more than 10 points, 
the remainder really improved, and the groups are essentially groups 
which showed significant improvement in IQ as against those whose IQ 
did not improve. Hence, the variable groups will be designated the 
improved groups. 

Achievement Test Results. In the case of all three achievement scores 
the delinquents have a lower grade level of achievement than the non- 
delinquents, and in both of the major groups the constant IQ group has a 
lower achievement level than the improved group. Table 3 demonstrates 
these relationships, not in terms of absolute achievement test scores, but 
in terms of the relation of the individual’s achievement level to his actual 
grade level. The actual grade level of the non-delinquents was figured on 

1 For a discussion of why they were omitted in the original report of the Kuhlmann 


Tests, see (18), pp. 16-17. 
* D/eDp for those two percentages is 1.71. 
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the basis of the time of the school year when the test was given, and in the 
delinquent group it was figured as closely as possible by the grade place- 
ment at the time of the year when the girl left school. The difference 
between actual grade level and the achievement test score gives a ‘‘differ- 
ence achievement score.’”’ These have been averaged for each group and 
are presented in Table 3. When compared with the norms for the Stan- 
ford Achievement Test, each group shows some retardation in grade 
achievement. It should be recognized, however, that this is true partly 
because the ceiling of the test is too low for a few of the group who were in 
10th, 11th, and 12th grades and who made scores above the maximum 
grade level of the test. No doubt the critical ratios between delinquents 
and non-delinquents would be slightly higher if the norm did not stop at 
the 11.0 grade. 

Personality Test Results. The results of the other tests used in the 
battery provide data for answering the first question posed; namely, are 
there significant differences of personality, as measured by this group of 
personality tests, between those adolescent girls whose IQ’s show signifi- 
cant improvement and those whose IQ’s show only slight variability? 

The Minnesota Multiphasic Personality Inventory is scored on eleven 
scales, three of which are checks on the validity with which the subject has 
answered the Inventory. These are the scores for “?’s,” L, and F. The 
2?” score is the sum of items put in the “Cannot say” category rather than 
answered as true or false. L is a lie score, which if too high, shows that 
the subject is attempting to present too favorable a picture of himself. 
The F-score is a check on how many extremely unfavorable items are 
included in the score; it includes items which normally are affirmed by 
only a few cases in a thousand. Although the F-score tends to go up as 
maladjustment increases, an extremely high F-score suggests carelessness, 
lack of comprehension, deliberate falsification, or scoring errors. The 
other eight scales are measures of specific abnormal tendencies, grouped 
under familiar psychiatric classifications. Their letter abbreviations have 
the following meanings: Hs—hypochondriasis, D—depression, Hy— 
hysteria, Pd—psychopathic deviate (formerly called psychopathic per- 
sonality), Pa—paranoia, Pt—psychasthenia, Sc—schizophreria, and Ma 
—mania. 

Table 4 shows the significance of difference between the subgroups of 
each major group. The ratios do not indicate a significant difference 
between the results for the constant and improved groups, but the test 
did discriminate extremely well between the delinquent and non-delin- 
quent groups. The differentiation which this scale and the other per- 
sonality tests made between delinquents and non-delinquents will be 
presented in a subsequent report. 
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Table 4 
Minnesota Multiphasic Personality Inventory—Significance of Differences of Raw Scores 
Delinquents, Diff. Non-Delinquents, Diff. 
between Improved and between Improved and 
Constant Groups Constant Groups 
First Test Second Test First Test Second Test 
Scale D/«D D/e«D DicD D/«D 
?-Score 12 55 27 1.03 
L-Score 48 01 a 1.00 
F-Score 3.30 2.07 -76 1.24 
Hs-Scores .26 1.96 35 53 
D-Scores .96 40 61 1.94 
Hy-Scores 1.75 .92 1.06 1.13 
Pd-Scores .30 .62 .22 24 
Pa-Scores 61 1.60 .25 .67 
Pt-Scores .00 1.35 43 .64 
Sc-Scores 43 1.18 50 14 
Ma-Scores 1.20 13 2.25 2.03 





The same treatment has been given the scores from the other per- 
sonality tests, namely, the Washburne Social Adjustment Inventory, the 
Pressey Interest-Attitude Test, and the Terman-Miles Test of Mascu- 
linity-Femininity. The Vineland Social Maturity Scale is a different type 
of test from these others, but may be grouped with them, and Table 5 


Table 5 
Other Personality Tests—Significance of Difference of Raw Scores 





Delinquents, Non-Delinquents, 





Diff. between Diff. between 
Test Constant and Constant and 
Test No. Improved Groups Improved Groups 

D/«D D/«D 
Washburne Ist .60 1.06 
Washburne 2nd 40 .06 
Pressey Ist 2.79 15 
Pressey 2nd 3.59 12 
Terman-Miles (one only) 1.22 .78 
Vineland * (one only) 49 21 





* D/cD between social quotients. 


shows the significance of differences between each group on both examina- 
tions. The Washburne, the Terman-Miles, and the Vineland do not show 
significant differences between the constant and improved groups. That 
the Pressey shows high critical ratios between the constant and improved 
delinquents but not the non-delinquents and does not discriminate be- 
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tween the total group of delinquents and non-delinquents appears to be 
a chance phenomenon. The Terman-Miles shows no significant differ- 
ences between any groups. 

Personal History. The second part of question (1) is: are there signifi- 
cant differences of background between those adolescent girls whose IQ 
showed significant improvement and those whose IQ varied only slightly? 
The first item on which they are to be compared is the occupational level 
of the father. In cases where the father is deceased or not with the 
family and the mother works, the occupation of the mother was rated. 
The occupations were classified according to the Taussig (29) scale, which 
has been used by Hildreth (15) and others. Chi square was computed, 
in a five by two table, to find out if there is a reliable difference in the 
distribution of occupational level of parents between delinquents and non- 
delinquents and the constant and improved subgroups of each. Between 
parental occupations of delinquents and non-delinquents P-value is less 
than .01. Application of chi square to constant and improved delinquent 
groups yields a P-value of .60, and applied to constant and improved non- 
delinquent groups the P-value is .04. 

The indications are, therefore, that there is a reliable difference be- 
tween the occupational level of the parents of the delinquents and non- 
delinquents, the occupational level being higher for the parents of non- 
delinquents. There is not a reliable difference between occupational 
levels of the constant and improved delinquent groups, but with the 
non-delinquents the difference approaches significance, the parents of the 
improved group tending to have a higher occupational level than those 
of the constant group. 

The parents’ education is another item which was investigated as far 
as possible with each group, but the educational information is more 
limited than the occupational information. In only 47 of the 101 delin- 
quent cases and 77 of the non-delinquents was it possible to obtain in- 
formation regarding parents’ education. The chi square test was applied 
in a four by two table and yielded a P-value between .01 and .02 between 
the education of the delinquents’ parents and the parents of the non- 
delinquents, indicating a reliable difference in favor of the non-delinquents. 
The subgroups within the two larger groups are so small, due to the 
incompleteness of the data, that chi square is not an appropriate test of 
differences, and for the same reason more elaborate treatment does not 
seem indicated. 

Differences in race, nationality, and language were so small as to be 
insignificant in all groups. Only nine of the delinquents and one non- 
delinquent had one or both parents of foreign birth. Five of each major 
group were not of the white race. Thirteen delinquents and two non- 
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delinquents came from bi-lingual homes. These few girls were about 
evenly distributed between the constant and improved groups. Work 
experience is another factor which differed markedly between delinquents 
and non-delinquents, the delinquents having worked more, but it had no 
apparent significance with respect to the two IQ groups. 

By means of the records at hand and personal interviews an attempt 
was made to tabulate any social problems within these girls’ families, 
including problems of health, mental retardation, mental instability, and 
social maladjustment, such as delinquency or criminality. Of the delin- 
quents 72% of the constant IQ group had social problems within the 
family and 72% of the improved group. Of the non-delinquents 23% 
of the constant group and 20% of the improved group had family problems 
of the types mentioned. Again the same pattern is seen wherein there is 
conspicuous difference between delinquents and non-delinquents but little 
or none between subgroups of each. 

Three additional items were tabulated for the delinquents only. The 
type of delinquency was not related to IQ changes, inasmuch as 91% of 
the constant group and 94% of the improved group were committed for 
sex delinquency. Ratings on the home conditions of the delinquents were 
made by field workers from the State School who visit the home of each 
girl who has been committed.. They used an adaptation of the Whittier 
Scale for Grading Home Conditions (34), which includes ratings on Neces- 
sities, Neatness, Size, Parental Conditions, and Parental Supervision. 
The ratings were examined to see if those girls whose IQ improved after 
a period in the institution came from the poorest or most unsatisfactory 
homes. There were no striking differences between the homes of the 
girls in the two subgroups. Ratings on the first three categories covered 
the entire range, but on the latter two, Parental Conditions and Parental 
Supervision, there was piling up on the unfavorable end of the scale. 
The third item considered for the delinquents only was the length of time 
out of school prior to commitment. The Kuhlmann Tests are pencil-and- 
paper tests, much like school work, and it was thought that a girl who had 
been out of school for some time might find it easy to better her score after 
a return to school. However, classifying the girls’ length of time out of 
school in a four part table and applying the chi square test, a P-value of 
between .70 and .80 was obtained, so there is no evidence that length of 
time out of school prior to taking the tests had any relation to ability to 
improve the score. 

Personality Tests (Changes from First to Second Examination). The 
second major question to be answered is whether significant improvement 
in the IQ of adolescent girls is related to improvement in the adjustment 
pattern of the total personality. Our data for answering this question are 


\ 
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the amount of change in the scores of those three personality tests which 
were given twice and the relation of these changes to changes in IQ. 
A second type of data is the analysis of personal-social events between 
examinations. 

The amount of change in scores on the personality tests was averaged 
for the various groups and compared. Table 6 shows that changes in 


Table 6 


Significance of Difference between Mean Changes 
of Score on Ist and 2nd Personality Tests 








D/«D DieD 
Changes of Constant Changes of Constant 
and Improved and Improved 
Test Delinquents Non-Delinquents 
Minnesota Multiphasic 
?-Score 1.02 53 
L AQ ° 33 
F 1.30 58 
Hs 1.90 21 
D 1.63 2.05 
Hy 1.17 2.00 
Pd 1.28 .20 
Pa 1.82 31 
Pt 1.49 .39 
Se 83 10 
Ma 1.86 35 
Washburne S.A. Inventory 1.38 2.40 
Pressey Interest-Attitude 87 .09 





personality adjustment as measured by scores on this group of personality 
tests occurred no more frequently in the improved groups than the 
constant groups. More detailed examination of the score changes shows 
that there was a slight but consistent tendency for the delinquents’ scores 
to shift more toward the mean for the normal population on the second 
test, but the average change was so slight and also so evenly distributed 
with relation to constant and improved groups that it is not of significance 
for the present problem. 

Personal-Social History in Interval between Tests. Health frequently 
is mentioned as a cause of changes in test performance. The health 
record for the delinquents was available at the School. With the non- 
delinquents a classification was made on the basis of the report of each 
girl on number of illnesses, which was checked by records of the school 
nurse, who attempts to record causes for each absence resulting from ill- 
ness. In the delinquent groups 23% of the constant IQ group and 45% 
of the improved IQ group received treatment for major health conditions, 
including pregnancies. Chi square applied to all classifications yielded a 
P-value between .10 and .05, and D/cD between the major treatment 
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categories of each group was 2.24. Although there is little likelihood of 
of the difference occurring by chance, neither test quite meets the criterion 
for a reliable difference. ‘Twenty-six cases in the entire group were fitted 
with glasses between examinations. They were about evenly divided 
between the constant and improved groups, but their average IQ change, 
however, was 11 points, which is 3 points higher than the average. 

The chi square test applied to the classification of health history for 
the non-delinquents yielded a P-value of .30, as compared with about .08 
in the delinquent group. No direct comparison between delinquents 
and non-delinquents can be made because of the lack of uniformity in the 
type of records, but there is more evidence that health was a significant 
factor in the delinquent than in the non-delinquent group. 

School grades of both delinquents and non-delinquents were analyzed 
in terms of trends in the interval between psychological examinations. 
Thirty-five of the delinquents had no school experience in this period, but 
of those who did there was no significant difference in the trend of their 
grades between the constant and improved IQ groups. In the non- 
delinquent group, too, the trend of grades in terms of improvement or 
the lack of it was no different in the improved than the constant IQ group. 

Three other factors were examined in the delinquent group as possible 
indicators of adjustment within the institution, namely, records of disci- 
pline, work ratings, and behavior ratings. These are records kept rou- 
tinely by the institution. The average number of disciplinary incidents 
for the constant group was 6.32 per person, and for the improved group 
it was 7.25 per person. Classification of types and quantity of discipline, 
tested by chi square, yielded a P-value between .20 and .10, which is not 
low enough to allow one to draw any definite conclusion about the group 
differences. When work ratings and behavior ratings were classified in 
terms of trends of improvement, lack of improvement, or poorer than 
before, the percentage of girls in each classification was almost identical 
for the constant and improved groups. Hence, it is not indicated that 
the group whose IQ improved showed any more improvement in adjust- 
ment to the institution than those whose IQ remained constant. 


Discussion 


The basic criterion for grouping these cases, of course, was the differ- 
ence in the scores made on two intelligence tests. On the first test the 
delinquents proved to be somewhat below average, which is consistent 
with other reports of the intelligence level of delinquents, but the non- 
delinquents had a mean intelligence score close to the mean for the general 
population. The test-retest reliability coefficient was satisfactorily high 
on each group, but slightly better with the delinquents than the non- 
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delinquents. The correlation between IQ change and time interval is so 
close to zero that it is impossible to say that the large changes in IQ are 
due to practice effect in the usual sense of the term, since that would 
result in a negative correlation between time interval and IQ change. 
Yet, 36% and 48% of each group, respectively, showed a change of more 
than 10 points in IQ on retest. This result, while indicating more change 
than that reported by many investigators, notably those using the 1916 
Stanford-Binet, is by no means unprecedented. Lowell (21) with her 
very large group of cases reported that 71% changed 7 points or more, 
suggesting that the mean change must have been considerably greater 
than that. Brown (5) reports an average change of 6 points, which is 
only slightly below the average for this study, and Allan and Young (1), 
using the Terman-Merrill, report results identical with the ones of this 
study which used the Kuhlmann. They found an average gain of 8 
points with 48% of the group varying more than 10 points. Variability 
such as these studies and the present one reveal is not simply variation 
around the mean inasmuch as the average change is a gain of 8 or 9 points, 
when computed with retention of the sign rather than change averaged 
regardless of sign. This gain is not to be explained by the conventional 
techniques of correlating IQ change with time interval or level of first IQ, 
as mentioned before. The range of IQ changes is quite similar whether 
based on the cases retested at 4 to 6 months or those retested at an interval 
of one year or more. The accuracy score on the Kuhlmann proves to be 
a rather stable index of the individual’s method of working, providing 
he is given tests properly ranged for his level of ability, but the speed score 
does have a positive correlation of .41 to .44 with changes in IQ. 

The achievement tests were used to find out whether those girls whose 
achievement level was least retarded were among those who were able to 
raise their IQ’s on retest. As mentioned in the statement of results, the 
relationship of achievement to grade level is somewhat distorted by the 
fact that the upper ceiling of the test is grade 11.0. For this reason more 
significance is attached to the critical ratios than would be warranted 
without this factor which depressed some scores at the upper end of the 
distribution. With both delinquents and non-delinquents the improved 
group showed less retardation in achievement than those whose IQ’s on 
retest stayed about the same. It appears that achievement level is more 
significant in this respect than the level of the first IQ. 

The personality test which provided the richest amount of material 
regarding individual adjustment was the Minnesota Multiphasic Per- 
sonality Inventory. Although it discriminated well between delinquents 
and non-delinquents, it gave no significant differences in either main 
group between those cases wherein the IQ improved significantly and 
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those in which it did not. The same pattern is borne out by the com- 
parisons of total scores on the Washburne Social Adjustment Inventory. 
As adjustment is measured by these tests, there is no greater maladjust- 
ment in one group than in the other. 

The Terman-Miles Test of Masculinity-Feminity showed no signifi- 
cant differences between any groups, and the Vineland Social Maturity 
Scale revealed no differences in social age between those whose IQ im- 
proved and those whose IQ remained constant. The Pressey Interest- 
Attitude Test behaved atypically from the other tests in showing some 
difference between constant and improved delinquents and none between 
constant and improved non-delinquents. Moreover, it did not discrim- 
inate between delinquents and non-delinquents. Hence, four out of five 
personality tests showed no evidence whatsoever that there were marked 
differences in personality adjustment at the time of the first test between 
those whose IQ showed subsequent improvement and those whose IQ 
remained constant. The fifth test, the Pressey, shows such conflicting 
results that no conclusion may be based on it. 

In general the results of the personality tests give a negative answer 
to the question of whether improvement in the adjustment pattern of the 
individual is related to improvement in IQ, although there is a slight 
tendency for the scores of the improved delinquents to move more toward 
the normal end of the distribution of scores than do the scores of the 
constant delinquents. With the non-delinquents, whose personality test 
scores were very normal on the first test, the differences in amount of 
score changes are not so large nor are they always in the same direction, 
which suggests that they are chance fluctuations. The differences with 
the delinquents are not great enough to be statistically reliable nor to lend 
real weight to the hypothesis that improvement in general adjustment 
results in improvement of mental test performance. 

As a further attempt to find out whether changes in IQ are related 
to changes in other factors which are indicative of the level at which the 
individual is functioning, in other words his total personality adjustment, 
a record was made of certain other items concerning the period between 
tests. Studies of IQ changes frequently mention physical health as a 
significant factor. The health record of the delinquents does not show a 
reliable difference between the groups, but the critical ratio is high enough 
to support the belief that physical condition may be a factor with bearing 
on efficiency of test performance. The present results lend support to the 
opinion of many psychologists that a test given at a time when an indi- 
vidual is not physically up to par should be repeated at a more favorable 
time. No one can judge with accuracy just which physical ailments 
affect mental performance and which do not. 
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Returning to the questions posed at the beginning of the study, we 
shall sum up the results within that framework. The only differences of 
personality and experiential background between those adolescent girls 
whose IQ’s show significant improvement and those whose IQ’s remain 
relatively constant are that those who do not improve so much on the 
second test tend to have more retardation in level of school achievement 
than the variable group, and they tend to come from homes in which the 
parents have a lower level of education and occupation. Of the changes 
occurring between tests, there were no significant changes in the improved 
group as compared with the constant IQ group of the non-delinquents. 
In the delinquent group those who showed the greatest improvement in 
IQ were the group who had a poorer health record, so presumably it con- 
tained more girls who were not in good physical condition at the time of 
the first test. 

This lack of proof for the significance of emotional factors in the test 
situation, at least for adolescent girls, carries some implications for the 
clinician. In recent years whenever a clinical psychologist cannot find 
any evident reason why an IQ changed significantly, it has become the 
custom to fall back on the idea that the individual may be better adjusted 
at the time of the second test, a statement rarely backed up by any data. 
Although there is a possibility that in some cases this is true, it is by no 
means a common or easily demonstrated cause for shifts in IQ at the 
adolescent or adult level. The problem in young children may be quite 
different and also with older adults, who often are tested at the time of a 
court trial or some other emotional strain. The latter situation is some- 
what similar to the situation of these delinquents when given their first 
test, but one is not privileged to generalize these conclusions beyond the 
adolescent group. The great emphasis put on emotional factors by psy- 
chologists, psychiatrists, and social workers has made all persons dealing 
with maladjusted individuals particularly conscious of them, and the 
present study in no way minimizes their importance to the individual or as 
a determiner of behavior, but only minimizes their effect upon mental test 
performance. 


Summary and Conclusions 


Two groups of adolescent girls, 101 delinquents newly admitted to a 
State School for Girls and 85 public school girls, whose ages ranged from 
12 to 19 and whose median grade placement was 9th grade, were given an 
individual intelligence test, an achievement test, and a battery of per- 
sonality tests. They were retested from 4 to 15 months later. Delin- 
quents and non-delinquents were divided into subgroups composed of 
those whose IQ’s on retest changed 10 points or less, called the constant 
groups, and those whose IQ’s improved more than 10 points, called the 
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improved groups. Differences of personality adjustment were measured 
by computing significance of difference of the mean scores for each group 
on the personality tests and by classifying and measuring the significance 
of difference of other items, such as occupation and education of parents, 
racial and national background, language in the home, work experience, 
length of time out of school, type of delinquency, other known social 
problems in the family, and home ratings. In the interval between tests 
school grades and health records were compared, and—in the delinquent 
groups—discipline records, and work and behavior ratings. Differences 
were computed between the mean amount of change in scores on per- 
sonality tests given in the second examination. 

The following conclusions appear to be warranted: 

1. The non-delinquents showed as large shifts in IQ on retest as the 
delinquents. 

2. In both delinquent and non-delinquent groups those who showed 
less improvement in IQ on retest showed the greater amount of achieve- 
ment retardation. 

3. There were no significant differences in the degree of adjustment at 
the time of the first test or in the experiential background between those 
who improved on retest and those whose IQ remained relatively constant. 

4. There were no significant differences between the constant and 
improved IQ groups in relation to the personal-social history which took 
place in the interval between tests. 

5. There were no significant differences of personality adjustment 
between the constant and improved IQ groups at the time of the second 
test. 

6. Concomitant personality factors, as measured by these tests and the 
criteria set up by this study, do not have any demonstrable effect on the 
changes in mental test performance between test and retest of adolescent 
girls. 

Received June 12, 1944. 
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Studies in the Symbolism of Voice and Action: V. The Use of 
Behavioral and Tonal Symbols as Tests of Speaking 
Achievement 


Franklin H. Knower 


University of Iowa 


There are two general lines of procedure which may be followed in 
initiating a program of research in test construction. First, one may begin 
with a large number of items selected because of some presumed diagnostic 
value in the test, and then, through a program of item analysis, select and 
retain those items which contribute most to the value of the test. In 
following the second method, one begins with a limited number of items, 
retains those found to be useful and then adds to them as the evidence 
indicates a need for additional items to improve the value of the test. 
The first method may be said to be most satisfactory in dealing with tests 
of a nature comparable to those of already known value. The second 
method is to be preferred in the development of tests which are essentially 
different from those of established patterns, for unless a test of a limited 
number of items can be shown to have some usefulness, the possibility 
that a test of a greater number of items will be of value is immediately 
open to question. 

The second method just described has been employed in the work on 
the tests used in this project. The data to be presented are to be inter- 
preted, therefore, as preliminary findings. 

The use of tonal and behavioral symbols as a supplement to linguistic 
means of communication long have been considered by critics as signifi- 
cant factors in the process of speaking. A number of recent studies have 
supported this impression. 

Monroe ' found that two of the more important elements in the second 
factor revealed by his factor analysis of the characteristics of good speech 
were ‘Used good gestures” and “Voice not monotonous.”’ Barnes? re- 
ports ‘‘Monotonous voices” and “Poor bodily control” as relatively high 
frequency faults among the speech students he studied. In a study by 


Monroe, Alan H. The measurement and analysis of audience reaction to student 
speakers. Studies in Higher Education, Purdue University, xxxii, 1937. 

* Barnes, Harry G. A study of the speech needs and abilities of students in a first 
course in speech training at the college level. Unpublished Doctoral Dissertation, State 
University of Iowa, 1932. 
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Gilkinson and Knower * ‘‘Monotonous voices,” ‘‘Inanimate bodies” and 
“Little facial expression’’ were not only indicated as high frequency faults, 
but were also shown to be speech characteristics clearly differentiating 
speakers of superior from inferior quality. At one point in their report 
on the Michigan Cooperative Study, Hayworth and colleagues‘ have 
indicated that the number of ‘‘Meaningful facial expressions per minute”’ 
was found to be a factor of relatively high weight in the determination of 
“Total effectiveness of speech delivery.” At another point in their report 
on the evaluation of specific tests of “Vocal interpretative ability,” 
“Pantomiming ability,” and “Facial expression,” their data indicate 
moderate to marked relationship with ‘‘Public speaking effectiveness” for 
the first two tests; and approximately a zero correlation for the test of 
“Facial expression.””’ The absence of an indication of relationship in the 
second test may be attributed to the inadequacy of the particular test 
used. 

With the exception of part of the data in the Hayworth study, all of 
the data obtained on this problem in previous studies have been secured 
by the processes of rating a speaker on his use of these symbolic processes 
during the activity of speaking. The data in this study, on the contrary, 
are not derived from an evaluative judgment on the quality of the per- 
formance, but as explained later are actual interpretations of meaning. 
This investigation was undertaken to secure data which might throw light 
on the answers to the following questions. 

1. Is it possible to develop reliable and valid objective group tests of a 
speaker’s skill in the use of tonal and behavioral symbolism as such? 

2. What can we learn from such objective tests about the relationships 
of such skills to the total effectiveness of the speaker in speaking, and to 
other characteristics of the speaker as a person? 

The tests upon which this study are based are adaptations to group 
testing purposes of the measuring instruments used in the study of tonal 
and visible symbolism reported in the Quarterly Journal of Speech by 
Dusenbury and Knower.’ The test form consists of a sheet of paper 
containing in the left hand margin a list of eleven emotional states each 
designated by three terms selected to facilitate the recognition of the 
particular qualitative or quantitative differences in the emotional states 
to be expressed. A series of columns across the sheet are arranged to 
permit judges to record their interpretations of each performer’s stimula- 

’ Gilkinson, Howard, and Knower, Franklin H. Psychological studies of individual 
differences among students of speech. Univ. of Minnesota, Department of Speech, 1939. 

‘Hayworth, Donald. A research into the teaching of public speaking. National Asso- 
ciation of Teachers of Speech, Detroit, 1940. Pp. 231. 

5’ Dusenbury, Delwin, and Knower, Franklin H. Experimental studies of the sym- 
bolism of action and voice. Quart. J. Speech, 1938, 24, 424-436; 1939, 25, 67-74. 
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tion in each performance. As each student is tested his name is placed 
at the head of a column and recorders’ interpretations are placed below it. 
The recorder’s name and other data are provided at the top of the judging 
sheet. 


Directions to the performers for the test of behavioral symbolism were as 
follows: “At the next meeting of the class you will be called upon to indicate 
the way you would express the eleven emotional states indicated on this sheet 
by facial gestures or pantomime. Although you are expected to depend pri- 
marily upon facial expression you may allow your body to adapt to the expres- 
sion you are trying to indicate in the face. You are requested especially to 
avoid hand gestures of conventional meanings. You are to use your articulators 
as if you were saying the letters of the alphabet from ‘a’ to ‘k,’ but you are not 
to form words with your lips which might be interpreted by lip readers. You 
are asked simply to express the emotions indicated as you would simulate them 
if you were portraying the emotions of a character in a story you were telling 
or in a play you were acting. During the class period I will call you to the desk 
on the platform one by one and hand you a series of cards one at a time on which 
have been typed the terms for the eleven emotional states to be expressed. As 
I hand out each card I will call out numbers from ‘1’ to ‘11’ and record these 
numbers on a key sheet for correcting the responses of the observers. The 
stimulus cards will be shuffled after each performance to vary the order for the 
next performance. You are expected to take only two to three minutes for 
the entire series of eleven expressions and you must therefore respond as quickly 
to the cue cards as you would to the cue lines in a play.” 

The directions for the members of the class, who served as judges when not 
performing, were as follows: ‘‘You are to place each performer’s name at the 
top of the column in which you record the judgments of his performance. As 
I call each number observe the speaker carefully, look quickly down the list 
of eleven emotional states and record the number in the proper column opposite 
that emotional state which in your judgment the speaker is simulating. You 
are to record a judgment for each number, and record each number opposite 
only one set of terms. After the first expression for each performer, if you 
interpret a later expression as the simulation of an emotion for which you al- 
ready have one judgment, you may place the second number also opposite the 
terms for that emotion: For every set of terms on which you record more than 
one judgment, you will leave one set of terms without a number. You should 
familiarize yourselves with the list in order to record your judgments rapidly.” 

The directions for performance in the tonal test differed from those for the 
visible symbolism only in the following manner. “At the next meeting of 
the class I will call each of you to the back of the room and ask you to 
simulate a tonal expression of each of the emotional states indicated on this 
sheet. You are to use no words but in giving tonal body and pattern to the 
expression you are to articulate the letters of the alphabet, from ‘a’ to ‘k.’ ”’ 


The process of judging the tonal expressions differed from the process 
of judging the facial expressions only in that the judges listened to the 
tonal expressions rather than observed the behavior of the speakers. As 
has been indicated, the instructor in the class made a key during each 
performance for the purpose of checking the correctness of interpretations. 
Since in no case were there fewer than sixteen persons in the class section 
tested, there were at least fifteen judges for each performance. 

The tests were scored in two ways. First the number of correct judg- 
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ments for each performer were divided by the total number of judgments 
when checked against the key to provide a score in terms of a percentage 
of audience comprehension of each performance. Secondly, the number 
of correct judgments rendered by each judge was divided by the total 
number of his judgments to provide a score in terms of the percentage of 
his comprehension of the performances of all other members of the class. 
Although the class average for the performances and judgments were of 
course always the same, marked differences in individua! ability of both 
types occurred within most classes. 

Before analyzing the data from the study, it may be well to consider 
some of the advantages and limitations of tests of this type. On the 
question of their desirable features, I wish to call attention first to the fact 
that these tests are actual tests of the social dynamics of speech. The 
score in no way depends upon an auditor’s interpretative judgment of the 
excellence of the performance, but only on the specific percentage of the 
speaker’s intelligibility. Since the tests are of this type they may be used 
to analyze individual differences in the nature of the behavior of both 
speakers and auditors. A relatively large number of persons can be 
tested in the average class period in a situation which approaches normal 
student speaker-auditor relationships. The fact that the test may func- 
tion as a learning activity commends it as a classroom exercise apart from 
its test features. 

Such tests may be criticized as abstractions in that they isolate tonal 
and behavioral symbolism from their normal accompaniment in speech of 
linguistic expression of ideas. While this criticism is obviously sound, it 
is probably not more true for the test in question than for typical tests of 
articulatory or linguistic skill. Although intelligent persons do not ordi- 
narily go about engaging in unprovoked facial activity or vocal noises, 
neither do any but school teachers and other intellectually curious persons 
go about randomly articulating phonemes, declining adjectives or parsing 
sentences. These characteristics of tests probably are necessary limita- 
tions of the problem at hand. Although the tests may not provide an 
adequate index of refinement in the use of non-linguistic symbolism for 
advanced students, the limitation of scores to intelligibility provides a 
sufficient top for most students with limited speech skills. A third ques- 
tion may be raised concerning a test of a speaker’s performance which 
limits the index of his skill to the comprehension ability of his audience. 
This might be a serious limitation were it not for the fact that in almost 
every class tested the range of scores on performance has been moderately 
high. Data will be presented which indicates that in a class of at least 
fourteen judges the reliability of the tests is sufficiently high for purposes 
of group analysis. 
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The major data on the project are presented in Table 1. The correla- 
tion of the performance scores of subjects as interpreted by seven judges 
with scores as interpreted by seven additional judges produces what may 
be called a split-half index of the reliability with which performances are 
judged. When corrected for the split-half technique, the reliability cor- 
relations for scores on behavioral symbolism were +.93 and for tonal 
symbolism +.87. ‘The individual performances, then, were reliably in- 


Table 1 
Reliability and Validity Correlations 











Behavioral Symbols Tonal Symbols 
Perf’nce Judg’nt Perf’nce Judg’nt 
Seven with Seven Judges 
(Corrected) .93 + .01 87 + .02 
Test-Retest 52 + .05 .69 + .03 .66 + .03 92 + 01 
Speech Rtgs. for General 
Effect’ness AT + .04 56 + .04 .25 + .05 .56 + .04 
Speech Rtgs. for Adjustment 42 + .05 40 + .05 34 + .05 41 + .05 
Speech Rtgs. on Phonation 31 + .05 46 + .04 
Performance with Judgment 
(1st Test) 57 + .03 49 + .04 
Performance with Judgment 
(2nd Test) .33 + .06 55 + .06 





terpreted. To determine whether or not the expressional skill of the 
individual was consistently characteristic of his performance, each test 
was repeated after an interval of one week for 100 subjects. The second 
line of the table indicates these test-retest correlations. Although these 
correlations indicate a substantial amount of consistency in the traits 
measured, they also indicate that there was considerable variation in per- 
formance on the second test, with skill in the use of tonal symbolism 
varying less than skill in the use of behavioral symbolism. Since the 
indices or scores for each performer are based on the average accuracy of 
interpretation of fifteen listeners,—where fourteen judges produced re- 
liability of +.93,—these correlations cannot be attributed to the unre- 
liability. of the particular scores. These test-retest correlations were 
seriously affected by the average gain in the effectiveness of performance 
on the second test, and by the fact that the range of scores was consider- 
ably reduced on the second test. Since the average gain in effectiveness of 
the use of tonal symbolism was as high as the average gain in effectiveness 
of the use of behavioral symbolism the higher test-retest correlations for 
the test of tonal symbolism indicate that fewer subjects approximated the 
top of the tonal test than the behavioral test. This assumption is sup- 
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ported by the fact that the mean level of performance in the use of tonal 
symbols was considerably lower than the mean level of the skills in be- 
havioral symbolism. 

The correlation indices of validity may be best described as moderate 
to marked in comparison with other tests of this type. These data are 
inconsistent with those reported by Hayworth in that this test of per- 
formance in the use of facial expression appears to be more closely related 
to speech skill than does the test of the use of tonal symbolism. An 
interesting feature of these correlations is found in the fact that ratings on 
performance are more closely correlated with skill in judgments of per- 
formance than with skill in performance itself. If this phenomenon 
should be found to be consistently true of this type of test, and if these 
validity correlations may be improved by the development of the tests, it 
will mean that since judgment scores may be obtained more conveniently 
than performance scores, the potential usefulness of the tests will be 
greatly improved. 











Table 2 
Means and Standard Deviations of the Distributions 
First Performance Second Performance 
§.D.of S.D. of $.D.of §S.D. of 
Means Perf. Judg. Means Perf. Judg. 
Behavioral Symbols: 
Freshmen (Required) 62.70 19.40 14.10 71.10 13.10 10.59 
Sophomores (Elective) 70.90 10.85 
Tonal Symbols: 
Freshmen (Required) 45.10 13.70 9.15 5740 13.50 10.90 
Sophomores (Elective) 66.80 13.95 
Acting Class 83.30 4.95 





Table 2 contains some additional data on validity in the form of means 
and standard deviations of distributions of scores for various groups. A 
group of 50 sophomores in an elective course in speech were compared 
with 100 freshmen in a required course and found to be eight per cent more 
intelligible in the use of visible symbolism and twenty-one per cent intelli- 
gible in the use of tonal symbolism than were the freshmen. The differ- 
ence between the two groups is highly significant in the use of tonal 
symbolism and probably reliable in the use of visible symbolism. Nine- 
teen students in an advanced class in acting received scores on the tonal 
test that were significantly higher than the scores received by the 
sophomores. 

The standard deviation of the judgment scores of the freshmen on the 
first tests are here seen to be considerably higher than their deviation on 
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the second tests. The amount of improvement on the second tests is also 
indicated here in terms of higher mean scores on the second tests. I 
should like to point out that, although these mean scores are considerably 
lower than those reported by Dusenbury and Knower’s advanced per- 
formers, they are still high enough to indicate that a highly significant 
amount of communication may take place through the use of tonal and 
behavioral symbolism by the average college student. 

In concluding, I wish to comment on the last two lines of correlations 
in Table 1. In spite of apparent differences in skill in performance and 
judgment these correlations indicate a considerable relationship between 
these two traits. The relationship appears to be more constant in the 
area of tonal symbolism than in the area of visible symbolism. These 
data support the suggestion previously advanced that a group test of skill 
in the judgment of tonal and behavioral symbolism might be developed 
which will provide a useful test of skill in performance. Such tests are 
useful as classroom checks on the development of significant aspects of 
speech achievement. 


Received May 11, 1944. 











Participation in High-School Football as a Factor Affecting 
College Attendance and Scholarship 


Erwin J. Henning and Harold D. Carter * 
School of Education, University of California 


Does participation in high-school football exert a significant influence 
upon the educational plans and later educational careers of high-school 
students? If so, what are the effects of such participation, and how do 
they operate? These questions, which have long been subjects for heated 
discussion as well as for organized inquiry, are approached again here, 
with a variation in technique that appears to be new to this field of 
investigation. 

In order to develop a standard for estimating the effect of playing 
football on the educational careers of the players, a control group was also 
studied. Each high-school football player was matched with a classmate 
who did not play football. The college attendance and achievement of 
the control group were used as norms for comparison with the performance 
of the athletes. 

Records were taken for 2,875 pupils, including 220 football athletes, 
graduating from four large high schools in the years 1935-1937 inclusive. 
These high schools were all situated in the urban area on the eastern side of 
San Francisco Bay. The 220 football players were matched with 220 
of their classmates on the basis of six criteria, namely: (1) school; (2) date 
of graduation; (3) high-school scholarship; (4) average measured intelli- 
gence; (5) reading quotients as measured by standardized tests; and (6) 
college preparatory study load, as measured by the number of units 
earned in courses which satisfy prescribed college entrance requirements. 
In the pages which follow, a detailed report is made concerning the high- 
school work and the college attendance and scholarship of the two 
groups. 

Significant Earlier Studies 


The controversy over college recruiting and subsidizing of athletes, 
which was at its height in the late 1920’s, precipitated the famous Carnegie 
investigation reported in 1929 in the bulletin on American College 
Athletics (4). In that research the case study technique was emphasized. 


* The writers are indebted to Dr. Robert Gordon Sproul, President of the University 
of California, for his encouragement, and for financial assistance which made the study 


possible. 
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Investigators traveled from college to college studying conditions pertain- 
ing to the treatment of athletes, and reporting their findings. An accom- 
panying survey of the literature resulted in a separate publication reported 
in the 24th bulletin of the society (5). This is an authoritatively com- 
prehensive survey of the literature on athletics up to 1929. By 1933, the 
Carnegie Foundation had spent $125,000 on its investigations of athletics. 

The Carnegie research found previous studies on the relation between 
scholarship and athletics ill-controlled and inconclusive. Lack of uniform 
definitions made it impossible to compare results in the various schools. 
For this reason, Howard Savage, who directed the Carnegie investigation, 
sponsored a study of scholarship and athletics at Columbia University. 
This study served as a model for coordinated projects in 52 colleges (15). 
Nearly all later research on the scholarship of athletes in college has also 
followed Savage’s model. The various investigations demonstrate that 
the athlete tends to be slightly inferior to the non-athlete in measured 
intelligence and in grades earned, but not enough so as to interfere seri- 
ously with his scholastic work. Athletes take a normal study load and a 
normal variety of college courses, and have slightly smaller academic 
mortality than non-athletes; however, more athletes than non-athletes 
receive grades near the failing mark, and more are on probation. These 
studies show a difference among sports, with sports for individuals, such 
as tennis, golf, track, etc., ranking high when judged by the academic 
achievement of participants, whereas team, spectator sports such as 
baseball and football rank low when judged by the same criteria. Later 
college investigations such as those by Tuttle (18), Hackensmith and 
Meller (10), Maney (12), and others tend to confirm the findings reported 
in the Carnegie Foundation studies. 

In the field of high-school athletics, Reals and Rees (14), Mathews 
(13), Hull (11), and Allen (1) find athletes slightly inferior to their non- 
athletic fellows. In no case is the reported difference large. Cormany 
(7), Cook (6), Beu (2), Schulman (16), and Shannon (17) report high- 
school athletes equal or slightly superior to their classmates. Several of 
these reports show that while athletes tend to be inferior to their fellow 
students who are active in other extra-curricular activities, they tend to 
be superior to those who have no extra-curricular interests. Davis and 
Cooper (8) reviewed 41 studies in this field, and concluded that probably 
the non-athletes are slightly superior to the athletes, but that the differ- 
ence is not significant either statistically or educationally. 

The above are but a sampling of the reports on athletes in high school 
and in college. Very few studies have followed high-school athletes into 
college. To do so seems desirable, since the controversy rages over the 
inducing of high school athletes into college by promises of glory, financial 
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rewards, or lowering of academic standards. Buck (3) studying 1098 
high-school boys in Colorado, found that 21.8 per cent of the athletes 
planned to go to college, whereas only 11.6 per cent of the non-athletes 
planned todoso. He gave no data as to the relative college qualifications 
of the two groups. Among the athletes, the athletic prestige of the col- 
lege stood first in determining a specific choice, whereas this was last in 
the list for non-athletes. Relatively more athletes than non-athletes 
persisted in college until they received a degree. Here again, only meager 
data are presented as to the relative qualifications of the two groups. 

The present study is unique in that it begins with high-school groups 
of equivalent academic qualifications, and follows them into college. 


High-School Records 


A by-product of matching football athletes with controls was the 
accumulation of a large mass of data regarding the high-school records of 
all athletes and non-athletes. These data are entirely in agreement with 
the literature on the scholarship and ability of high-school athletes. 
Table 1 presents the results. They show the non-athletes to be slightly 
though consistently superior to the football players. 

These data have more value than merely substantiating the extensive 
literature in the field. By falling completely in agreement with numerous 
other investigations, they demonstrate the typicalness of the high-school 
sample used here, and hence indicate the validity of our further findings. 
They encourage one to generalize with confidence beyond the four specific 
high-school populations studied. 

The above-mentioned data are for total populations, showing that 
the football athletes closely resemble the rest of the student body. For 
the matched pairs, however, the resemblance is still closer as a result of 
the process of matching. The median difference between the football 
players and their controls in grade point ratio, intelligence quotients, 
reading quotients, and type of high-school load is only .19 times as large 
as its standard error, demonstrating the practical identity of the qualifica- 
tions of the matched groups. This identity is necessary to satisfy the 
basic premise of this study. If athletes and their controls are substan- 
tially identical in the qualifications which ordinarily determine college 
entrance and success, then the average college experiences of the two 
groups should also be identical within statistical limits of chance variation, 
provided that playing football in high-school has no influence in college. 
Conversely, any real variation between the experiences of the two groups 
is attributed to the football qualifications of the experimental group. 
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Table 1 
High School Record of Football Athletes Compared with That of Various Other Groups 





Comparing Diff. 
Mean Lines Diff. S.E.pire. 





Grade Point Ratio: 

. All football players 2.47 
All others f 2.54 
Matched control group 2.47 
. Athletes attending college 2.62 
. Controls attending college 2.76 


Intelligence Quotients: 

. All football players 106.44 
All others 107.94 
Matched control group 106.80 
. Athletes attending college 110.31 


. Controls attending college 112.71 


pee pre aor on = 


a Quotients: 
1. All football players 107.10 
. All others 110.58 
. Matched control group , 108.84 
. Athletes attenuing college ‘ 110.42 
. Controls attending college ¢ 113.90 


Number of Half-Units College 
Preparatory Courses: 
. All football players 220 25.98 
. All others 55 25.30 
. Matched control group 220 25.57 
. Athletes attending college 133 27.95 
. Controls attending college 99 28.13 





Plans to Attend College 


More football athletes than controls planned to go to college, as indi- 
cated by transcripts sent. Table 2 shows that 28.2 per cent more athletes 
than controls sent transcripts to college. This difference is 7.52 times 
its standard error, and hence is statistically significant. The University 
of California received more enquiries from athletes than from controls, but 
the difference is not significant. All colleges receiving fewer than three 
enquiries from prospective students in this group were classified as ‘‘mis- 
cellaneous’”’ colleges. In the aggregate, these miscellaneous colleges re- 
ceived 47 enquiries from football students and 37 from controls. The 
difference is not significant. Besides the University of California and 
the miscellaneous colleges, thirteen other colleges received transcripts 
from prospective students. Of these, the seven receiving the majority 
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of the transcripts were all local institutions which emphasize football. 
This is the group of colleges which received significantly more transcripts 
from football athletes than from members of the control group. 


Table 2 
Planned and Actual College Attendance of Members of Matched Groups 





Football Control Diff. 
Group Group Diff. S.E.p  5.E.pirr. 





Planned college attendance as indi- 
cated by transcripts: 


Total number sending transcripts 201 139 
% of 220 sending transcripts 91.4 63.2 28.2 3.75 7.52 
Number planning to go to U.C. 77 66 
% of 220 planning to go to U.C. 35.0 30.0 5.0 4.46 1.12 
Number planning to go to 13 

colleges 78 36 
% of 220 planning to go to 13 

colleges 35.4 16.4 19.0 4.07 4.67 


Actual college attendance of mem- 
bers of matched groups: 


Total number going to college 133 99 
% of 220 going to college 60.5 45.0 15.5 4.69 3.30 
Number going to U.C. 62 56 
% of 220 going to U.C. 28.2 25.5 2.7 4.21 64 
Number going to 13 colleges 53 24 
% of 220 going to 13 colleges 24.1 10.9 13.2 3.54 3.73 





College Attendance and High-School Scholorship 


Table 2 summarizes facts from the college attendance records of mem- 
bers of the matched groups. 2.8 per cent more athletes than controls 
attended the University of California; the difference is negligible. 13.3 
per cent more athletes than controls attended the thirteen colleges; this 
difference is significant. Eighteen football athletes and nineteen controls 
attended miscellaneous colleges. Thus a total of 133 athletes attended 
college, compared with 99 members of the control group. The difference 
is 3.09 times as large as its standard error. The excess college attendance 
of athletes over controls is due almost entirely to the fact that the thirteen 
| colleges accepted more athletes than members of the control group. The 
inference follows that the excess college attendance of the athletes is due 

| to their high-school football experience. 
| The data in Table 1 permit comparison of the qualifications of the 
3 athletes who attended college with the qualifications of the controls who 
: did so. The athletes tend to be inferior to the controls. Which colleges 
accepted these inferior students? The qualifications of the members of 
the matched groups who attended the miscellaneous colleges are nearly 
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equivalent. The data for members of the matched groups attending the 
thirteen colleges and the University of California are compared in Table 3. 
At the University of California the high-school football students who 
entered college came with generally better qualifications than those of 
their controls. At the thirteen colleges the reverse was true. Table 3 


Table 3 


Comparison of High-School Qualifications of Football Athletes and Control Groups 
Attending the University of California and the 13 Colleges 





Football Groups Control Groups 





U.C. 13 Colleges U.C. 13 Colleges 








Number attending 62 53 56 24 


Mean high-school grade-point ratio 2.90 
Difference between means 52 
Standard error of difference 08 
Critical ratio 6.23 


Mean high-school IQ 114.07 
Difference between means 7.07 
Standard error of difference 1.74 
Critical ratio 4.06 1.35 


Mean high-school reading quotient 112.10 
Difference between means 3.12 10 
Standard error of difference 2.46 
Critical ratio 1.27 .04 


Mean number half-units college 

preparatory courses 29.71 28.82 
Difference between means 3.46 1.44 
Standard error of difference .93 .60 
Critical ratio 3.72 2.40 





emphasizes this situation by comparing the football students who at- 
tended the University of California with the football students who at- 
tended the thirteen colleges. A similar comparison is made for members 
of the control group who attended these schools. 

In all cases the high-school qualifications of students attending the 
University of California were superior to the average qualifications of 
those who attended the thirteen colleges. This is true both for athletes 
and for controls. The University of California attracts the better stu- 
dents from this area. However, the differences between the qualifications 
of the controls who attended the University of California and the thirteen 
colleges were relatively small and statistically non-significant. The criti- 
cal ratios are displayed in Table 3. Thus while the state university 
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attracted generally better-qualified students than the thirteen colleges, 
the differences were small for ordinary students. The seven local colleges 
emphasizing football, however, attracted and accepted football students 
who were definitely inferior in academic ability and achievement. Since 
this is not true for non-football students, the results suggest a lowering of 
standards by these local colleges, for prospective football players. The 
same results indicate that at the University of California standards were 
not lowered for football players. 








Table 4 
Average Grade-Points Ratios Earned in College by Members of Matched Groups 
Diff. 
N Mean Diff. S.E.p S.E.pirr. 

University of California 

Football Group 62 1.272 

Control Group 56 1.141 131 081 1.62 
Thirteen Colleges 

Football Group 53 0.782 

Control Group 24 1.105 323 187 1.73 
Totals 

Football Group 115 1.046 

Control Group 80 1.133 .087 .086 1.01 





Table 4 shows that at the University of California the football 
athletes earned slightly higher grades than members of the control group 
The difference is so small, however, as to be insignificant. On the other 
hand, the grades received by football players in the thirteen colleges were 
inferior to those of their controls. These results are exactly in agreement 
with the high-school qualifications of the groups concerned. Thus while 
the thirteen colleges accepted football players with inferior qualifications, 
the grades given them were not out of line with their abilities. Appar- 
ently the influence which gets these inferior students into college does not 
extend to the classroom teachers who assign them grades. 


Other Comparisons 


A number of additional comparisons are furnished in Table 5. Of 
the football players entering college, 45.1 per cent were eventually gradu- 
ated, as compared with 42.4 per cent of the controls. Thus, in spite of 
inferior entering qualifications, the athletes were graduated at least as 
frequently as were the controls. 

The types of college courses pursued by athletes and controls were 
equally distributed. While the athletes took fewer technical and com- 
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Table 5 
Miscellaneous Data Concerning Members of Matched Groups Who Attended College 





Football Group Control Group 


N o 


0 





University of California: 
Number who attended 62 
Average no. of semesters 7.6 
Number graduating 36 
Semesters on probation 53 
Number dismissed 10 


Thirteen Colleges: 
Number who attended 53 
Average no. of semesters 4.3 
Number graduating 19 
Semesters on probation 25 
Number dismissed 7 


Miscellaneous Colleges: 
Number who attended 18 
Average no. of semesters 4.3 


~ 


Number graduating 5 


Total Group: 
Number in college 133 
Average no. of semesters 5.8 ‘ 
Number graduating 60 45.1 42.4 
Semesters on probation 78 11.2 9.4 
Number dismissed 17 14.7 10.0 





mercial courses than the controls, they took about the same number of 
professional courses, arts courses, and vocational courses. The distribu- 
tion is normal enough to refute the argument that athletes take only easy 
courses in college. 

As to the relative numbers of the two groups who were on probation, 
numbers of honors received, and numbers of semesters in attendance, the 
data tend rather consistently to favor the control group; but by very 
small margins. 


Star Athletes 


From the group of 220 athletes, the sixty most outstanding football 
players were selected. These were compared with the other athletes, with 
the controls, and with the general population. No data, either in high- 
school or in college, yielded sufficiently large differences to permit one to 
conclude that the sixty star athletes enjoyed an experience in any way 
atypical for the whole football group. 
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Summary and Conclusions 


A study has been made comparing the college attendance and scholar- 
ship of 220 high-school football athletes with similar data for 220 of their 
classmates who did not play football. The two groups were carefully 
matched for intelligence and high-school scholarship. The data lead to 
the following conclusions: 

1. In comparison with other students of equivalent academic qualifica- 
tions, football athletes more often plan to go to college. 

2. A greater proportion of football athletes actually do go to college. 

3. A greater proportion of football athletes graduate from college. 

4. The football athletes have slightly inferior high-school records. 

i 5. The football athletes tend to enter certain colleges in which football 
; is a prominent sport. It is inferred that standards of admission in these 
t colleges are lowered for prospective football players. 


Received May 26, 1944. 
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Two Methods of Combining Attitudes of Like, Indifference and 
Dislike Into One Score 


Philip Eisenberg 
Columbia Broadcasting System, Inc., New York City 


In radio research, as in many other fields of investigation, the social 
scientist frequently asks his subjects to express their like or dislike of a 
stimulus. Since it is undesirable to force the individual to express like or 
dislike when he is not sure of his opinion or when he does not feel strongly 
either way, he is frequently permitted to express his doubt or indifference. 

Thus, the listeners’ judgments can be completely distributed among 
the three categories of like, indifference and dislike. For convenient 
comparisons of reactions to different stimuli, it is desirable to express the 
three percentages in one combined score. Two such combined scores 
have been used. The problem of this investigation is to determine the 
relative merits of the two scores as applied to the attitudes of groups of 
listeners responding to various radio programs.’ 


The Two Scores 


A. The LD-score. The customary technique of combining the three 
reactions into one score is to assign relative weights to each reaction and 
sum the resultant products. One way of doing this is to subtract percent- 
age dislike from percentage like. Minus signs can be eliminated by 
assigning weights of 2 for like, 1 for indifference, and 0 for dislike. Either 
method yields the same result. We will refer to this score as the Like- 
minus-Dislike score, or for convenience, as the LD-score. 

B. The S-score. Lazarsfeld and Robinson? have proposed another 
technique of combining the three reactions into one score, which is essen- 
tially a sigma score, and can therefore be called an S-score. It is based 
on three assumptions: 

1 All data were obtained with the use of the Lazarsfeld-Stanton program analyzer 
technique as used by the Program Analysis Division of the Columbia Broadcasting 
System. In this technique, the listener is asked to express his attitudes throughout a 
broadcast by pressing a green button when he likes what he is listening to, a red button 
when he dislikes it, and neither button when he is indifferent. The pressing of the 
buttons is automatically recorded on a moving tape. For a more detailed description 
of the program analyzer, see the article by T. Hallonquist and E. Suchman in Radio Re- 
search 1942-1943 edited by P. F. Lazarsfeld and F. Stanton, pp. 265-334. 

? Lazarsfeld, P. F., and Robinson, W. 8. Some properties of the trichotomy “like, 
no opinion, dislike” and their psychological interpretation. Sociometry, 1940, 3, 151-178. 
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1. Reactions to a stimulus are quantitative and in a continuous series. 
Three types of reaction are obtained because of the limitation of the in- 
structions given to the subject. But it is safe to assume that different 
degrees of feeling are expressed at different times within the same category 
of reaction. This is borne out by the comments made when the subjects 
talk about a program to which they have reacted. 

2. Reactions are normally distributed. Since intensity of reactions is 
not measured, this assumption cannot be tested directly. However, the 
normal distribution remains a useful assumption, especially since it is 
unlikely that the distribution of reactions to a radio program ever assumes 
the shape of a J-curve. 

3. A point of true neutrality exists within the indifference range. This 
assumption is really a specific aspect of the first two assumptions. Since 
it is assumed that there is a gradient of reactions ranging from extreme 
dislike to extreme like, there must be some point between these two ex- 
tremes of true indifference or neutrality. The most likely point of neu- 
trality seems to lie somewhere in the middle of the indifference range. 
Lazarsfeld and Robinson present some empirical evidence to support this 
assumption.’ 

The S-score itself is the distance on the base line between the neutrality 
point (the middle of the indifference range) to the average of the distribu- 
tion, expressed in sigma units. The method of its computation is de- 
scribed in the previously cited article by Lazarsfeld and Robinson. 

There are two apparent advantages of the S-score over the LD-score: 
(1) Standard deviation units are definite statistical terms with known 
meaning, whereas LD units are difficult to interpret. (2) Standard devi- 
ation units are equal, whereas LD units are not. 


The Method 


In order to study the relative merits of the S- and LD-scores for entire 
programs, the percentages of like, indifference and dislike were obtained 
for 53 different radio programs, representing a variety of program types. 
The number of subjects listening to a program varied from 49 to 116, with 
an average of 67. 

The S- and LD-scores were compared for the total ratings of the 53 
programs. They were also compared for program parts within three 
programs: one which was highly liked, one highly disliked, and one to 
which most reactions were indifference. 


* The assumption of a neutrality point is not necessarily required when it is noted that 
the S-score can also be viewed as the distance of dislike in sigma units subtracted from 
the distance of like in sigma units. This method of calculation will yield a score exactly 
twice the size of the S-score. 
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The S- and LD-scores and the percentages of like, indifference and 
dislike for each of the three programs, and for the average of the 53 pro- 
grams, are presented in Table 1. 











Table 1 
Reactions to Radio Programs 
Scores Percentages 
Radio No. of 
Programs Ss LD Like Indif. Dislike Subjects 

Liked 1.00 62 69 24 7 70 
Indifferent 50 .28 37 54 9 116 
Disliked — .03 02 25 48 27 50 
Average of 53 .68 40 49 43 9 3550 





It can be seen from this table that “liked,” “disliked” and “‘indifferent’’ 
programs are relative terms. Generally more reactions tended toward 
like than toward dislike, which is not unexpected since the programs are 
designed for entertainment. However, analyses of oral interviews of 
listeners to these programs confirm the designations obtained statistically. 


Results 


A. Relation between the S- and LD-scores. Table 2 presents correla- 
tions between the S- and LD-scores for total ratings in 53 programs and 
for ratings within three programs.‘ 








Table 2 
Correlations between S- and LD-Scores 
Correlations 

Total Scores of 53 Programs 97 
Program Parts 

Liked Program .93 

Indifferent Program .79 

Disliked Program .99 





With the exception of the indifferent program, it is apparent that the 
S-score and the LD-score are so highly correlated with each other that the 
use of either score will result in approximately the same rankings of 
programs or of program parts. The high correlations further suggest that 
the data would be interpreted in much the same way whether the S- or the 
LD-scores were used. 


4 In all cases, correlations for the total ratings of the 53 programs are Pearson product- 
moment, and for program parts are rank-difference. 
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Profile charts (not shown in this report) of S- and LD-scores for pro- 
gram parts within the three programs show graphically the high relation- 
ship between the two scores. However, it is interesting that at the begin- 
ning and end of programs, and during applause and transition passages, 
where the percentage of like and dislike usually decreases, the LD-score 
tends to come closer to the base line than does the S-score. From this 
one may conclude that the LD-score reflects increased ‘‘indifference’’ 
more readily than the S-score. 

B. Relations between the Two Scores and Percentage Like, Dislike and 
Indifference. Table 3 presents the correlations between the S- and LD- 


Table 3 
Correlations between Two Scores and Like, Indifference and Dislike 





Total Scores Program Parts 


53 Liked Indifferent Disliked 
Programs Program Program Program 








8-L 87 86 42 .68 
S-I — .62 —.75 — .09 — .32 
S8-D — .80 —.27 — .64 — .80 


LD-L 96 97 89 .74 
LD-I — .80 — 91 — .56 — .37 


LD D — .69 01 —.17 —.79 
Minimum 
Significant 33 39 42 45 
Correlation 





scores and the percentage of like (L), indifference (I) and dislike (D) for 
the total scores of 53 programs and for parts within three programs. 

From this table it is apparent that the S- and LD-scores correlate very 
much in the same way with the percentages of like, indifference and dis- 
like. However, the three reactions have a more equal weight in the S- 
than in the LD-score. This seems to be the case since the correlations for 
total scores and for program parts between the three reactions and the 
S-scores are more equal in size than for the LD-score. 

Another significant difference between the two scores is that the 
S-score seems to give greater weight to dislike than does the LD-score. 
The LD-score gives the greatest weight to like, the predominant reaction. 

C. The Reactions of ‘Indifferent’ Subjects. A further comparison of 
the two combined scores can be obtained by examining the “‘indifference”’ 
reactions. One approach to this problem was to examine the reactions 
of the most indifferent subjects in various programs. This analysis will 
yield some information concerning sustained indifference at least. Those 
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subjects who pressed either red or green buttons less than one-half of the 
program time were arbitrarily classified as ‘‘indifferent” subjects. For 
eleven different programs, the indifferent group averaged 37 per cent of 
all subjects. 

The general attitude of the subjects to a program was ascertained by 
their answer to a standard question: 


In order to get you to listen to future broadcasts in this series, should the 
programs be: very much like this one, improved a bit, or improved a good deal? 


| Those who checked “very much like this one” have been designated Satis- 
fied listeners; those who checked “‘improved a bit,’’ Conditional listeners; 
: and those who checked “improved a good deal,” Dissatisfied listeners. 
This question has been found to correlate very highly with analyzer 
reactions and with the tenor of the comments made by listeners in group 
interviews after each program. 

In eleven different programs, it was found that 27 per cent of the 
Satisfied listeners, 40 per cent of the Conditional listeners, and 55 per cent 
of the Dissatisfied listeners were “indifferent.” This indicates that the 
“indifferent” listeners tend to be more dissatisfied with a radio program 
than the “non-indifferent”’ listeners. 

One cannot conclude from these findings that “indifference” is always 
a negative reaction. ‘Indifference’ may express an intermediate state 
between like and dislike. It may express anticipation or relaxation, as at 
the beginning of a program or in transition passages. However, it is clear 
that some so-called “indifference” reactions are really negative. This 
seems to be true of much of the sustained indifference. The S-score, 
| therefore, seems to be superior to the LD-score, for radio programs at 
least, since it gives more weight to dislike and in that way, seems to take 
into account that part of indifference which is really negative. 
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Summary and Conclusions 





The overwhelming conclusion from these data is that the S- and 

LD-scores yield virtually the same results and compel virtually the same 
interpretations of listeners’ responses to radio programs. This conclusion 
supports the earlier investigation of Likert ® in which he demonstrated 
that sigma scaling of attitude questions are no more reliable or discrimi- 
nating than a scoring of 1, 2, 3, 4 and 5 for the five alternate answers. 
Despite this conclusion, there are certain other considerations which 
militate in favor of the S-score: 


| 5 Likert, R. A technique for the measurement of attitudes. Archives of Psychology, 
E 1932, No. 140. 
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1. The S-score reflects all three reactions more equitably than the LD- 
score. The LD-score gives the greatest weight to the predominant reac- 
tion, which in the case of radio programs is like. The S-score gives more 
even weight to all reactions, which in the case of reactions to radio pro- 
grams, gives more weight to dislike. Such additional weight seems to be 
justified since it is highly likely that in a situation which is positively 
toned, any degree of dislike may have more significance than is indicated 
by a percentage. In addition, since it has been shown that some of the 
“indifference” reactions are really negative, dislike should be given more 
weight. 

2. The trend of the S-score is maintained in periods of “indifference.” 
The S-score does not drop as much as the LD-score at the opening and 
close of the program and during transition passages. The trend of the 
S-score seems to be more justified than the trend of the LD-score at such 
points because these are not really periods of indifference. Analysis of 
listeners’ attitudes during these periods reveals that the listener is not 
indifferent; he is waiting for something to happen. During transition 
passages and at the end of the program, the listener’s attitude is one of 
relaxation rather than indifference. 

3. The S-score units are equal whereas the LD-score units are not. 
Equality of units permits direct comparison of reactions to programs and 
to program parts. 


Received April 18, 1944. 
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Book Reviews 


Luckiesh, M. Light, vision and seeing. New York: D. Van Nostrand Co., 
1944. Pp. 323. $4.50. 


In this treatise Dr. Luckiesh gives a simplified presentation of the relation- 
ships of light, brightness, vision, lighting and seeing. An attempt has been 
made to combine fundamental facts with practical discussions. The material 
is aimed to be helpful to those interested in better seeing conditions and their 
effects upon human efficiency and welfare. The book is dedicated to “Better 
Light—Better Sight,” a slogan employed by the lighting industry. 

This book will make a strong ap to the uncritical reader or to the unin- 
formed reader. It is of considerable importance, therefore, to examine the 
material in some detail. First, however, let us note some of the contributions 
which should receive unequivocal approval: (1) The material is clearly presented 
in relatively simple language. (2) The author makes a strong case for the 
maintenance of hygienic conditions for seeing. (3) Desirable emphasis is 
placed upon the relation of brightness and brightness contrast to visibility and 
to ease of seeing. (4) An important place is rightly given to the interdepen- 
dence of the four fundamental factors (size, contrast, brightness, and time) that 
determine the visibility of objects. (5) The measurement of visual acuity is 
adequately handled. (6) One of the best sections deals with alternation of 
brightnesses and glare in the visual field in relation to efficiency and ease of 
seeing. (7) The section on light and color is practical and is effectively done. 

Analysis of the discussions suggests the following criticisms to the reviewer: 
(1) The author consistently ignores the fact that the eyes readily adapt to easy 
and effective seeing over a wide range of illumination intensities. Emphasis is 
“s only to the adaptation for effective vision at the higher levels. (2) The 

ismissing of findings conflicting with the author’s views by means of ridicule 
rather than on the basis of sound criticism is both ineffective and a sign of 
weakness. Thus we find employed such terms as “sheer nonsense,”’ “the valor 
of ignorance,” ‘‘gross ignorance,” “ridiculous,” and ‘‘meaningless’’ to express 
the author’s reaction to the contributions of others. (3) Several decades of 
work in the field of vision is no criterion of infallibility. Furthermore, the 
recurring phrases “it is axiomatic” or these “‘iacts are axiomatic’ become un- 
convincing, since in certain instances the critical reader will recognize that they 
are neither facts nor axiomatic. (4) Relatively high intensities are required 
for —_——- seeing where discrimination of details involves low brightness con- 
trast. It does not follow, however, that the same high intensities are necessary 
in the large majority of everyday visual situations. This distinction is not 
made clear. (5) In considering the fixational pause of the eyes, questionable 
data are cited although adequate data are available in the literature. (6) In 
citing eye-movement time for reading (page 128), the record is decidedly atypi- 
cal or the computations are wrong. busbiien of the back sweep and other inter- 
fixation movements are far too long. (7) It is stated that 11 point type is far 
above the average typography commonly encountered. This is misleading 
since actual figures shar that journals and books are typically printed in 10, 
11 or 12 point type. (8) The author’s attitude toward rate of reading as an 
indicator of ease of seeing is highly inconsistent. Thus: (a) “It is axiomatic 
that if very poor seeing conditions are improved . . . quantity of useful work 
done should improve.” (6) ‘As a criterion of optimum levels of illumination 

. rate of performance of a visual task is inadequate.” (c) Nevertheless 
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speed of reading is employed to measure influence of brightness contrast between 
print and background. (d) ‘“‘At best, speed of reading is an insensitive cri- 
terion” of ease of reading. (e) But the author employs eye-movement meas- 
ures (perception time, fixation frequency, pause duration, regression frequency) 
in reading to show the effect of variation in level of illumination. He concludes 
that the effects of fatigue are obviously greater for the low level of illumination 
as is evidenced by the change in eye-movement measures. Apparently the 
author does not realize that eye-movement measures are merely measures of 
speed of reading. It seems that rate of perception (reading) is accepted as a 
criterion of ease of seeing only in situations where the results support his views. 
(9) To measure ease of seeing the author usually obtains results for one versus 
100 foot-candles, or one versus 10 versus 100 foot-candles. It is obvious 
that, in any visual situation where discrimination is concerned, one may expect 
improvement in visual efficiency in going from one to 100 foot-candles, or even 
from one to 10. And where discrimination is severe improvement is probable 
from 10 to 100 foot-candles. Luckiesh himself suggests that, to obtain a 
significant improvement in seeing, one should double the foot-candle level. 
He also states that one is generally more concerned with the practical optimum 
(level of illumination) than with the absolute maximum for many ordinary 
tasks. It is pertinant to ask, therefore, why he does not employ in his studies 
1, 2, 4, 8, 16, 32, 64 and 128 foot-candles rather than only 1 versus 100, or 1 
versus 10 versus 100. Responses at various levels from 10 to 100 foot-candles 
have not been investigated. Possibly the rate of change in efficiency is ex- 
tremely slow from 15 or 20 to 100 foot-candles. In the majority of visual situa- 
tions which involve details considerably above the visual threshold in size, we 
are interested in knowing the level of illumination above which no practical 
gains in efficiency occur. This cannot be revealed by Luckiesh’s data. (10) 
The use of heart rate, the blink technique and visibility measurements as 
criteria of ease of seeing have been criticized in another paper.! An additional 
comment may be made. The reliability of blink scores for a five minute period 
of reading is low (+.49). Also one may question the stability of some differ- 
ences obtained. For example, Memphis medium type is called easier to read 
than Memphis bold because 7 per cent more blinks occur with the bold in five 
minutes of reading. About 25 blinks occur in five minutes. Thus the bold 
would be read with 1.6 more blinks per period. With only 18 to 40 subjects, 
is this a stable difference? (11) The author delights in setting up straw men 
so that they may be knocked down. Thus he implies that someone has stated 
that 10 foot-candles is enough for any kind of reading—which has not been done. 
Then he points out the need for relatively high intensities needed for people 
— eye disabilities and for reading very illegible type—to which no one 
objects. 

Dr. Luckiesh has written an interesting and important book. The treat- 
ment of several topics may be considered excellent. Much of value may be 
gained from the rest of the material if read critically. 


Miles A. Tinker 
University of Minnesota 


‘Tinker, M. A. A reply to Dr. Luckiesh. J. appl. Psychol., 1943, 27, 469-472. 
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New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should be 
sent to Donald G. Paterson, Editor, Department of Psychology, 
University of Minnesota, Minneapolis 14, Minnesota 


Outlook for the serviceman: A discussion of the education, re-employment, and 
rehabilitation of veterans. Colonel John N. Andrews. Institute on Postwar 
Reconstruction, New York University, Washington Square, New York 3, 
N. Y., 1944. Pp. 184. $.30. 

The college and teacher education. Armstrong, Hollis, and Davis. Request 
copies from Helen Seaton, American Council on Education, 744 Jackson 
Place, Washington 6, D. C., 1944. Pp. 311. $2.50. 

Your problem: Can it be solved? D.J. Bradley. New York: Macmillan Co., 
1945. Pp. 213. $2.00. 

Counseling in personnel work. A bibliography: 1940-1944. Compiled by Paul 
8. Burnham. Public Administration Service, 1313 East 60th Street, Chi- 
cago 37, Ill. $1.00. 

Final report on the library film forums project, 1941-43. Glen Burch, Chairman. 
ee Library Association, 520 N. Michigan Ave., Chicago 11, Ill. 

p. 41. $.50. 

Employee counseling: A new viewpoint in industrial counseling. Nathaniel Can- 
tor. New York: McGraw-Hill Book Co., 1945. Pp. viii + 167. $2.00. 

Your personality. Virginia Case. New York: Macmillan Co., 1944. Pp. 277. 
$3.00 


Pastoral work and personal counseling. Russell Dicks. New York: Macmillan 
Co., 1944. Pp. 230. $2.00. 

The Brush Foundation study of child growth and development: psychometric tests. 
Elizabeth Ebert and Katherine Simmons. Washington: Society for Re- 
search in Child Development, National Research Council, 1943. Pp. xiv 
+113. (Monographs of the Society for Research in Child Development. 
Vol. VIII, No. 2.) 

Reading difficulty and personality organization. Edith Gann. New York: 
King’s Crown Press, 1945. Pp. xii + 152. $2.00. 

Marriage and family counseling. Sidney E. Goldstein. New York 18: Mc- 
Graw-Hill Book Co., 1945. Pp. 450. $3.50. 

Making and using industrial service ratings. George D. Halsey. New York: 
Harper & Bros., 1944. Pp. 149. $2.50. 

Large Scale Rorschach techniques. A manual for the group Rorschach and multi- 
= choice test. M.R. Harrower-Erickson and M. E. Steiner. Springfield, 

ll.: Charles C. Thomas, 1944. Pp. xii + 420. $8.50. 

Reality practice as educational method. Hendry, Lippitt, and Zander. New 
haw A as a House, 1944. Psychodrama Monographs, No. 9. Pp. 
36. $1.50. 

Occupational therapy in the treatment of the tuberculosis patient. Holland Hud- 
= = — Fish. Livingston, N. Y.: Livingston Press, 1944. Pp. 
317. .00. 

Effects of music on factory production. W. A. Kerr. Stanford University: 
Stanford University Press, 1945. Pp. 40. $1.00. Applied Psychology 
Monograph No. 5. 

Managing your mind. S. H. Kraines and E. 8. Thetford. New York: Mac- 
millan Co., 1945. Pp. 374. $2.75. 
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The technique of building personal leadership. D. A. Laird. New York 18: 
McGraw-Hill Book Co., 1944. Pp. 239. $2.00. 

The science of man in the world crisis. Ralph Linton et al. New York: 
Columbia University Press, 1945. Pp. xvi + 520. $4.00. 

A handbook for old age counsellors. L. J. Martin. San Francisco: Geertz 
Printing Co., 1944. Pp. 84. 

Color vision. S. D. Melville. Reprinted from four issues of The Optometric 
Weekly, 1944. $.50. Obtain reprints from Reading Clinic Sec’y, Room 8, 
Burrowes Educ. Bidg., Penn. State College, State College, Pa. 

Soldier to Civilian. G.K. Pratt. New York 18: McGraw-Hill Book Co., 1944. 
Pp. 233. $2.50. 

The scientific selection of salesmen. J.L. Rosenstein. New York: McGraw-Hill 
Book Co., 1944. Pp. 259. $3.00. 

Freud: Master and friend. Hanns Sachs. Cambridge: Harvard Univ. Press, 
1944. Pp. 195. $2.50. 

On measurement of motor skills. E. M. Schroeder. New York: King’s Crown 
Press, 1945. Pp. 256. $2.25. 

Developing a student guidance program in an instructional department. Scott, 
Morgan, and Lehman. Columbus 10: Ohio State Univ. Press, 1945. Pp. 
65. $.50. 

Elementary educational psychology. C. E. Skinner, editor. New York 11: 
Prentice-Hall, Inc., 1944. Pp. 448. $3.75. 

The handbook of industrial psychology. May Smith. Philosophical Library, 
15 E. 40th St., New York, N. Y. Pp. 304. $5.00. 

Role analysis and audience structure. Zerka Toeman. New York 17: Beacon 
House, 1944. Psychodrama Monographs, No. 12. Pp. 19. $1.25. 

Personnel relations: Their applications in a democracy. B.J.E. Walters. New 
York: Ronald Press Co., 1945. Pp. 547. $4.50. 

First course in psychology. R.S. Woodworth and M. R.Sheehan. New York: 
Henry Holt, 1944. Pp. x + 445. 

Normal lives for the disabled. Edna Yost and Lillian M. Gilbreth. New York: 
Macmillan Co., 1944. Pp. 298. $2.50. 

Guide to the evaluation of educational experiences in the armed services. Compiled 
for the American Council on Education under the direction of G. P. Tuttle. 
$2.00 a set. Mail orders to 363 Administration Bldg., Urbana, Illinois. 

Music in industry: A manual on music for work and for recreation in business 
and industry. Industrial Recreation Association, Chicago, 1944. Pp. 64. 

Personnel records. Industrial Welfare Society, Inc., 14 Hobart Place, London, 
S.W. 1, England, 1944. Pp. 24. 2s. 

The Van Allyn job placement technique. National Institute of Vocational Re- 
search, 305 W. 8th St., Los Angeles 14, California. $10.00. 
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